Classical Test Theory (CTT)

This is one of the two main statistical/mathematical approaches to quantitative psychometrics of the last sixty or so years (as of 2024!). The other is Item Response Theory (IRT). Both are actually a mix of methods and approaches and CTT has probably been the more unified and dominant of the two.

Details #

At its simplest CTT is a model of measurement in which the measure is a continuous variable and across a set of measurements, and across a theoretical population of measurements it is made up of “true variance” and “error (variance)” and, similarly,any individual measurement is made up of two parts: the true score and the error. These proportions are unknowable for individual scores and the proportions across a dataset (sample) of scores can only be estimated. The true variance across the set of scores is the valid variance and this proportion of the total variance is the validity of the measurement process, the proportion of the total variance that is error is the unreliability of the measurement process.

This model gives us some very simple equations.

Unknowably for individuals:

$$ \text{observed score} = \text{true score} + \text{measurement error} $$

Model of the population variance:

$$ \text{Var}{observed} = \text{Var}{true} + \text{Var}_{error} $$

If one imposes some further but very traditional quantitative statistical assumptions on this model this model, which isn’t much more than a sensible truism, becomes quite a powerful way to estimate reliability and validity. The fundamental additional assumptions are these:

  • Your dataset is a sample, obtained by random sampling from an infinite population. (A pretty common and fundamental model creating the sampling model.)
  • Any one observation is independent of any other. (Often overlooked but necessary to add to the above to make the model easy to turn into maths.)
  • That the distributions of the true and error variances are Gaussian in shape. (There are alternatives but the Gaussian model makes for the maths that didn’t need powerful computers and there are some reasons to argue that it’s not daft.)
  • That the error variance is orthogonal to, i.e. completely uncorrelated with, the true variance.

With this model and creating some simple designs, e.g. test-retest measurement on two or more occasions, or measures being made up by adding scores from multiple items, we get to powerful ways to estimate validity and reliability: test-retest reliability/stability estimation, internal reliability/consistency, inter-rater reliability/agreement and factor analyses.

At first the stipulation that error and true score must be orthogonal may seem perfectionist and implausible but it’s really a tautology, a completely necessary part of the model if we want to talk about validity and reliability, if we want to separate things into signal and noise: if error is partly correlated with true score then it is partly true score! For it to be pure error, i.e. random contamination of the measurement, it has to be uncorrelated with true score.

The assumption of Gaussian distributions, and some tricky things we can get into with the variances when we have multi-item measurement, are more of a challenge than assuming Gaussian distributions. However, particularly since the arrival of powerful and affordable computing power and software to handle these issues, ways to handle deviations from Gaussian models and variance problems have emerged.

Try also #

Cronbach’s alpha
Gaussian (“Normal”) distribution
Independence of observations
Inter-rater agreement/reliability
Internal reliability/consistency
Item Response Theory (IRT)
Heteroscedasticity
Reliability
Test-retest reliability/stability
Validity

Chapters #

These ideas run through chapters 2, 3 and 4 and, to some extent, through the whole book.

Online resources #

Not yet.

Dates #

First created 18.viii.24.

Powered by BetterDocs