Test-retest reliability

This is one of the fundamental ways of exploring the reliability of a measure. The idea is simple: if what you are measuring doesn’t change then if your measurements of it change over at test-retest interval then that change is error and it is telling you about the reliability of whatever measurement you are using.

Details #

There are a few things to remember about this:

  1. It’s a brilliantly simple idea: clear and obviously sound,
  2. it’s about multiple things measured on the two occasions not just one,
  3. of course, there are complexities for our measures!

#2: It’s about measurements of multiple people, things not just one because reliability is about variance: it’s the ratio of the true variance to the total measured variance. To get variance in our measured values we want to have measurements of multiple things (all at least twice). We also want multiple things because we want to know that our measuring system is reliable across a range of values. To give a trivial example: test-retest reliability is clearly a good and simple way to test a “bathroom” weighing scales. So I get a number of people of various weights, ideally covering most of the range of weights I will need my scales to handle. I weigh each of them twice making sure they don’t eat, drink or excrete between the weighings, now I have a set of paired values: the first weight the scales showed, and the second, per person. The model is that their true weights didn’t change so any change between the two weighings is down to the unreliability of the scales. So our true score variance is the observed variance minus the variance of the changes. So we needed multiple people to get variance. However, another benefit of having multiple observations: we will find out if the scales only works up to a certain weight if it’s very clear that all the heavier people are getting the same measured weights.

#3: of course there are complexities in our field. The biggest is that very few things we want to measure can really be regarded as staying the same in most people even over quite short periods of time. That complicates things quite a lot for us. Two other lesser but still important complications for us are individual differences in change: some of us change more than others on many variables; and the “test-retest artefact”: the fact that for some things, particularly MH/WB measures, non-help-seeking samples very often show a statistically significant drop in mean score on retesting.

More on these complexities in upcoming Rblog entries. Also a shiny app to allow you to get a pretty comprehensive exploration of any test-retest data you may have is also nearing completion.

Try also #

Correlation
Cyclothymic personality
Individual differences in temporal stability
Intra-class correlation coefficient
Lability (of mood and other issues)
Psychometrics
Reliability
Temporal stability
Test-retest effect/artefact/artifact

Chapters #

TrT is a subheading in Chapter 3.

Online resources #

None yet, working on it!

Dates #

First created 2.vi.24, links improved 30.viii.24.

Powered by BetterDocs