Test-retest effect/artefact/artifact

I think we should really say “test-retest effect” and I often put scare quotes around the “artefact” … but enough of that! These are terms for the common finding that when a self-report questionnaire measure of difficulties or distress is given to a collection of people not seeking help (a “non-clinical sample”) then their mean score often drops, i.e. improves if they are asked to repeat the measure say a week or so later.

Details #

A classic and to my mind important paper on the issue is Durham, McGrath, Burlingame, Schaalje, Lambert & Davies (2002). The effects of repeated administrations on self-report and parent-report scales. Journal of Psychoeducational Assessment, 20, 240–257. https://doi.org/10.1177/073428290202000302

That paper concludes:

A retest artifact, related to the frequency of administration, has been identified with the YOQ, particularly when it is administered on a weekly basis, and partial support for a retest effect was found for the OQ. Data collection involving four or more waves produced a reliable decrease in parent report symptomatology in the nonclinical sample. The question of an outcome measure’s susceptibility to the retest artifact is one that challenges the reliability of an instrument for use in establishing accurate and effective standards for mental health care. Factors that may explain a retest artifact must be further investigated so that participant change scores are accurately interpreted. The context of both the retest artifact and the nature of the measure being used (i.e., increased sensitivity to change) must be taken into account to determine exactly how much of an effect the artifact presents. This research has uncovered the impact of the retest artifact and provided estimates of the retest effect to assist interpretation in clinical outcome assessment.”
Durham et al. 2002, p.255

My own experience, with the CORE measures but also with others including the “OQ” of the study (now the the OQ-45) and some other measures is that the shift varies a bit with measures and definitely with test-retest interval and is small compared to the typical mean shift seen when people seeking help receive psychosocial interventions. What intrigues me is that despite a clear statement of the need for more research into the effect from very influential authors of that study, I have seen no major explorations of it. I think that’s because it is a bit of an embarrassment to the therapy research and psychometric communities: we are expected to have nice simple answers and to be able to call this an “artefact” (or “artifact”!), a contaminant and something we can perhaps dismiss or ignore but what says it’s an artefact? It’s an effect, not a hugely powerful but a pretty replicable one. It’s something we don’t really understand and maybe it would be useful to our fields to understand it more but we already know enough to see that that might come up with some complicated findings and we don’t like that, so we largely ignore it!

Try also #

Psychometrics
Reliability
Test-retest reliability

Chapters #

Not specifically mentioned in the OMbook.

Online resources #

None likely I think.

Dates #

First created 30.viii.24.

want to suggest changes or got questions?

Updated on 30th August 2024

Details #

Try also #

Chapters #

Online resources #

Dates #

Share This Article :

How can I help? (Be aware: I do this in my spare time so may not be able to help... but I will try!)