Mixture models

This is a pretty cutting edge bit if statistics but the idea is very important for our field. The idea is very simple: in other statistical models a fundamental idea is that the one model fits, subject to error and impacts of unmeasured variables, consistently for all members of the population. In a mixture model this assumption is abandoned and the new assumption is that within the population there are two or more subpopulations within which different models apply.

Let us take a not unreasonable example. Let’s say we are interested in whether improvement on some change measure is partly affected by ethnicity congruence: dominant ethnic group vs. not a member of that group. In a single population model this might be approached as a 2×2 analysis variance (ANOVA) model looking to see if the data suggest a statistically significant interaction effect of therapist and client gender on change score. The model assumes, for the maths to work, that the same relationship between ethnicity congruence and change applies for all clients and that differences between clients are only down to random variation (“error” … in the rather specific sense of misfit to the model), or perhaps to other measured variables, for example age, number of sessions attended if these are entered into the ANOVA model.

In a mixture model this assumption is replaced by one in which there are two or more subpopulations. The simplest mixture model here is one in which there are two subpopulations one of which has a systematic relationship between change and ethnic congruence and another subpopulation in which there is no such relationship: initial score is irrelevant to final score. If this model is more accurate representation of the real client population then a single population model will risk missing that there is a systematic effect of gender congruence for the one subpopulation because its effect in the total sample may be swamped by the presence of the other subpopulation.

Details and implication #

That’s almost enough to say for our purposes as once one goes past those basics things get very complicated as the statistics/maths of how one can fit mixture models to a dataset is what the mathematicians rather lovably call “non-trivial” involving the maths and computation practicalities of estimation, of trying to find the best fit but even to get there it involves a lot of specifying the model. So that model above would start, like the single population ANOVA model, with assumptions of Gaussian distributions for both the subpopulation with an non-zero interaction effect and that with no interaction effect. The assumption that each subpopulation is homogenous (the same assumption as in the single population model) is a given as is the assumption that there are no other subpopulations and then the task of the fitting process is to estimate the parameters of the effects in each subpopulation including the non-zero interaction in the one subpopulation and has to estimate what proportion of the total population belong to each subpopulation.

Estimating these models, and the potentially more complex ones, is a challenge even with modern statistical sofware and hardware power and expertise in these challenges is still rare but one simple and fairly obvious issue is that you need large datasets, and for the mixture model to be both simple and correct for the estimation to work.

The key issue for me is that often one or more mixture models may be highly plausible in our field and if the single population model is not an accurate model then we know that the findings from our single model analyses may be misleading but I don’t remember ever having seen this mentioned in reports!

Try also #

Analysis of Variance (ANOVA)
Estimate, estimation
Gaussian (“Normal”) distribution
Null hypothesis significance testing (NHST) paradigm
Population
Sampling and sample frame
Subpopulation

Chapters #

Not mentioned in the OMbook.

Online resources #

None currently.

Dates #

First created 15.viii.24.

Powered by BetterDocs