Validity

This is one of the most basic ideas in psychometrics and is a quality of a measure. It means pretty much the same as the lay use of the word: that the information you get from the measure is valid, i.e. accurate, unbiased. For a thorough introduction to these ideas you really want to read chapter 3 of the book.

Details #

It really makes sense to pair the idea with that of reliability. Technically reliability is freedom from random error and validity is freedom from systematic error. In archery, reliability is tight clustering of your arrows, validity is your arrows centring on the gold (the “custard” in UK archery). If you are a good archer setting up you shoot six arrows and they will be tightly clustered: not much affected by random error, however, if you had not made the right adjustment for a cross wind they might all cluster to the left of the custard: systematic error. A good archer shoots some “sighters” then adjusts the sight, or mental adjusts if the bow doesn’t have a sight. For a good archer, the second set of six arrows will have the same good clustering but will now cluster around the custard: s/he still has good reliability and has now removed the invalidity by adjusting for the cross wind.

This makes very good sense for blood tests and other laboratory assays: they have a low level of random error and regular testing with standard samples allows them to be checked for drift (invalidity). We do apply the same model to typical therapy change data but though the basic idea of considering random error (unreliability) and systematic error (invalidity/bias) is sound, the idea that self-report questionnaire data is a “soft” form of laboratory assays is dangerous … but that’s back to chapter 3. I’ll just give a quick overview of validity for self-report questionnaire measures and rating scales. Each of these headings has a further entry going into more detail but it’s best to have read through this before diving into them. (And even better to have read Chapter 3 if not the whole book!)

Face validity #

Unless you are trying to create an opaque measure (almost completely dropped out of fashion) then face validity is the starting point: do the items look sensible? Do they address what you want to measure?

Content validity #

The more rigorous extension of face validity: does the measure appear to address all facets of what you want to measure and does it not appear to include items, or parts of items, that seem to be slipping away from the target thing you want to measure, or which might introduce bias. For example some items might introduce “social desirability response bias”: they may ask something, or ask in a way, that makes it likely that some respondents will feel ashamed to answer with their first instinct and will modify their answer to one they feel is more “socially desirable”. This is introducing systematic error or bias and hence compromising validity.

Convergent validity #

If there is already some measure of what you want to measure then convergent validity is whether the scores on your new measure correlate positively with that existing measure.

Discriminant/divergent validity #

If you feel pretty sure what you want to measure is not related to something else then divergent validity is showing that there is no correlation between measures of the two things across good samples. It is showing that the second variable is not systematically biasing, invalidating, scores for the thing you want to measure.

Construct validity #

This is probably the most disputed labelled part of measure validity and to some extent wraps up all the earlier aspects but within an quantitative framework: it is evidence that the scores on the measure behave as they should alongside measures of other things. Typically it is expected of measures with subscales and it’s tested by factor analysis. Good construct validity is then the finding that items on one subscale correlate with each other more than they correlate with items from another scale. It’s seeing if the correlations of the items supports the content validity and shows convergent and divergent validity.

Predictive validity #

This is in some ways the strongest test of the validity of a measure. Do earlier scores on it predict something? For example if it is believed the early insecure attachment predisposes to later help seeking for psychological distress then showing a stronger than random correlation between ratings of attachment insecurity in childhood and adult help-seeking is predictive validity.

Caveats #

This is a fair summary of classical psychological, psychometric thinking about questionnaires and rating scales and these are useful ideas if used thoughtfully and not as if they were confirming that our measures are “validated” without real reflection about this oversimplifying things. But “Clients with different problems are different and questionnaires are not blood tests” part of the title we gave to our (Paz et al., 2020). Three major points to watch are:

  • The field of psychological change in therapies is hugely complex and very few things we want to measure are neatly segregated. To give one very simple example: is an association of scores with gender evidence the instrument is gender biased, or is it confirming that gender may relate to distress and problems?
  • People differ in how they think about measures: completing a measure can be done extremely quickly and casually but even then the client will be parsing the items and perhaps thinking about what is wanted. Some people may interpret things very differently from others and no instrument can cover all the issues that any client could have nor any strength or resource in them that may help them change.
  • People change differently and almost all of classical psychometrics and thinking about validity is based on a model that assumes this is not true.

Try also #

Bias
Construct validity
Content validity
Convergent validity
Correlation
Discriminant/divergent validity
Face validity
Predictive validity
Reliability
Gender

Chapters #

As noted, this forms much of Chapter 3 but the questions run through the whole book and surface particularly in Chapters 9 and 10.

Reference #

Paz, C., Adana-Díaz, L., & Evans, C. (2020). Clients with different problems are different and questionnaires are not blood tests: A template analysis of psychiatric and psychotherapy clients’ experiences of the CORE‐OM. Counselling and Psychotherapy Research, 20(2), 274–283. https://doi.org/10.1002/capr.12290. (Open access.)

Dates #

Created 15.xi.21, updated links 7.x.24.

Powered by BetterDocs