Missing at Random (MAR)

To me this is a truly terrible name. It describes the situation in which values are missing in a dataset not completely at random. Where there is MAR missingness the missing data are not randomly scattered throughout the dataset but the non-random nature of the missingness is not going to bias the exploration of the data.

The example given in wikipedia is:

An example is that males are less likely to fill in a depression survey but this has nothing to do with their level of depression, after accounting for maleness.

https://en.wikipedia.org/wiki/Missing_data#Missing_at_random

As I understand this, what this is supposing is that you have a dataset with two variables: gender and a depression score but fewer men than women are participating in the survey. To me this means that, taking the words literally, the missingness is not random, not “missing at random” but missing systematically with relation to binary gender. However, as it is being presumed that the reluctance to participate in the study is not related to depression level, then the analyses are not going to bias the estimate of the relationship between gender and depression (because the missingness is not related to depression).

Details #

Rather confusingly (to me at least) the Wikipedia entry continues:

Depending on the analysis method, these data can still induce parameter bias in analyses due to the contingent emptiness of cells (male, very high depression may have zero entries). However, if the parameter is estimated with Full Information Maximum Likelihood, MAR will provide asymptotically unbiased estimates. [citation needed]

https://en.wikipedia.org/wiki/Missing_data#Missing_at_random

(I like the “citation needed” there: indeed!)

I think what this is saying, without explaining it fully is that, because there will be a lower proportion of men in the dataset than in the population and as very high depression is rare in the population, there may be no very high depression scoring in the males in the dataset and this may be sufficient to pull the dataset mean depression score for the men down from their population mean. I also think that the very technical “if the parameter is estimated with Full Information Maximum Likelihood, MAR will provide asymptotically unbiased estimates” is saying that one way of handling missingness in data when trying to minimise its effect on estimating population parameters, here the mean score by gender, will remove that bias that would come from the under-representation of the men in the data.

The catch is that we can’t know that missingness is not related to depression scores: MAR is an assumption, an aspiration, a hope!

Try also … #

Missing Completely at Random (MCAR)
Missing values
MICE (Multiple Imputation by Chained Equations)

Chapters #

Not mentioned in the OMbook.

Online resources #

I’m unlikely to create any I think, the issues are too complex and too general to make it easy.

Dates #

Created 12/9/24.

Powered by BetterDocs