We have tried to be careful in the book to use “dataset” instead of “sample” when talking about our example data as we think this helps remind us that both statistical paradigms of hypothesis testing and of confidence intervals assume random sampling and are each brilliant for their purposes where that applied but that using them for data that may be nearer a census than a traditional sample, and which is never really a random sample, carries risks of bias and limits on generalisability.
Where data really are sampled from a population the sample frame tells us how the sample was created. For example, if to make things manageable only a random subset of clients’ data are analysed, then the sample frame might be “a 10% sample of the entire set of clients’ data were selected using a set of random numbers”. In research it might be that referential data was developed by, e.g. “contacting individuals selected at random from the electoral register”. The first is probably a genuinely unbiased sampling frame, the second, while good, won’t be: not everyone is on the electoral register and almost certainly a non-random subset of people contacted will refused the survey.
Try also … #
Sample
Population
Estimation
Confidence intervals
Generalisability
Chapters #
Nothing here yet!