View Categories

Missing values, “missingness”

In outcome/change measurement missing values are common and have to be considered carefully. Reports of data should always give the rates of missing data

Details #

Occasionally there may be structural missing variables: values that are impossible for a particular subgroup or subpopulation. E.g. pregnant is an important binary variable (and how long pregnant) but no-one without a uterus can be pregnant so pregnant would generally be a variable with a “structural missing” cell for males.

Beyond the obvious and vital reporting of rates of “missingness” there are ways of exploring the likely impact of missingness on the findings. The simplest is probably to substitute the maximum score possible for all missing data and then to substitute the minimum score possible and report the new statistics. For example if there are missing data on an item that can have a score between 1 and 5 you can report the observed mean and then the mean if all the missing values are replaced by 1 (which will be lower than the observed mean) and the mean if all missing values are replaced by 5 (higher). This is a very crude way of showing the impact of missingness.

Unless missingness is “completely at random” (MCAR = Missing Completely At Random) missingness is always likely to bias our findings and will make them less reliable even if the data are MCAR. There are methods to impute values to replace missing values, more sophisticated than just trying replacement by the possible maximum and minimum. For example across missing values on items of a multi-item measure we could substitute the mean across the items the respondent did complete (“pro-rating“) or, probably less defensibly for our work, the item mean across all respondents who did complete the item could be substituted. All such methods are liable to introduce bias but the bias may be less than that caused by simply ignoring participants with missing data. We recommend caution about very sophisticated methods such as “MICE” (Multiple Imputation by Chained Equations”): they sound impressive but they are only necessarily less biased than simply reporting statistics for the non-missing data if missingness really is MCAR … as we say, generally implausible in therapy data.

Try also … #

Bias
Missing at random (MAR)
Missing completely at random (MCAR)
MICE
Pro-rating
Variance: introduction
Variance: computation and bias

Chapters #

Nothing here yet!

Online applications #

At some point: online forms to describe missingness in data and some analyses of its impacts.

Dates #

Created on or before 13/6/21, last updated 12/9/24.

Powered by BetterDocs