This is a key idea within the Null Hypothesis Significance Testing (NHST) paradigm. In that paradigm the model is that statistical tests look at the fit between the results found in a single experiment (or survey) and a population model. If the probability of finding results as impressive or even more impressive given the population model were less likely than some predetermined criterion (almost always, by convention, .05, or one in twenty) then that null model is rejected and the findings deemed “statistically significant”. Statistical power is how probable it would be that we would find statistical significance given a non-null population model. Typically researchers might want an 80% probability of getting a statistically significant finding if the population really is more interesting than the null model, a certain amount more interesting.
Details #
That’s all very abstract. Let’s make it concrete: say we are interested in whether there is a difference in the mean improvement on a change measure across a therapy depending on whether the client has paid employment or not. The null hypothesis is that there is no difference in the improvement between employed and unemployed clients. The researcher, therapists, managers might feel that if in the wide population of potential clients a difference as large or larger than say ten points improvement is worth knowing and might change how therapy is delivered in a service. They would then want to know how many clients they would need to have seen through therapy to have, say, 80% likelihood of a statistically significant difference when they compare improvement by employment status. 80% is the statistical power they want. The answer will depend on many things but particularly on the size of that effect they want to detect, here a ten point difference, and on how many clients they have seen through therapy with known employment status and known improvement. The larger the number of clients the greater the power. By plugging a few more population parameters into the model: the distributions of the improvements, the standard deviations (if assuming Gaussian distributions) and the proportions of employed to unemployed clients, a statistician can tell you how many clients you need in your study to get 80% power.
In the full NHST paradigm a study should not really happen if it is intended to give a definitive p value (i.e. statistical significance) without this power being worked out in advance and, often, it will be argued the study should not be done if the power is less than 80% (or higher). It is deemed that it should not be done because it is too likely that it will fail to detect a real population effect and, if the finding from the study is that the results are not statistically significant, it is too likely that this will be misinterpreted as having shown that there is no population effect.
To some extent if everyone understood the NHST paradigm properly, if everyone understood that non-significant findings from small studies prove very little, we wouldn’t need to think like that. However, historically our fields have hugely overvalued and misunderstood the NHST paradigm and it would be good if we did, if using the paradigm, take statistical power, and the inevitable limitations of the NHST paradigm, much more seriously than we do.
Try also #
Confidence intervals (CIs)
Estimate/estimation
Gaussian distribution
Null Hypothesis Significance Testing (NHST)
p-values
Statistical significance
Chapters #
Chapter 5 in the OMbook.
Online resources #
None currently
Dates #
First created 1.xii.24.