Data entry: two out of range item scores can really affect Cronbach’s alpha

This little saga started over a year ago when I helped at a workshop a psychological therapies department held about how they might improve their use of routine outcome measures. They were using the CORE-OM plus a sensible local measure that added details they wanted and for which they weren’t seeking comparability with other data.

In the lunch break someone told me s/he had CORE-OM data from a piece of work done in another NHS setting (with full research governance approval!) The little team that had put a lot of work into a small pragmatic study felt stymied because the Cronbach alpha for their CORE-OM data was .65 and they were worried that this meant the perhaps the CORE-OM didn’t work well for their often highly disturbed clientèle. They had stopped there but thought of asking me about it.

My reaction was that I shared the concern about self-report measures, not just the CORE-OM, perhaps not having the same psychometrics, not working as well, in severely disturbed client groups as in the less disturbed or non-clinical samples in which they’re usually developed. However, I hadn’t thought that would bring the alpha down that low and wondered if they had forgotten to reverse score the positively cued items.

As everyone’s crazily busy I didn’t hear anything for a long while but then got a message that they had checked and the coding was definitely right, would I have a look at their data in case it really was about the client group as they knew I was interested in how severity, chronicity and type of disturbance may affect clients’ use of measures.

I agreed and received the well anonymised data. About 700 participants had completed all the items and the alpha was .65 (not that I really doubted them, I just like to recheck everything!) So I checked the item score ranges though I hadn’t really thought there was likely to be much by way of data entry errors. There wasn’t: just two out of range items in over 23,000. The one was 11 and the other was 403. Changing them to missing, and hence dropping two participants resulted in an alpha of .93 with a parametric 95% confidence interval from .93 to .94, i.e. absolutely typical for CORE-OM data.

I would never have believed that just 0.008% incorrect items could affect alpha that much, even if one was 403 when the item upper score limit is 4: I was wrong! Well, perhaps it’s not quite that low a percentage. If that 11 was 1 for the one item (item22) and another 1 which should have gone into item 23 then perhaps many of the remaining items for that client were wrong; same for 403 for item 28, after all 1, 4, 0 and 3 are all possible item scores on the CORE-OM. That would take the incorrect entries up to 0.08%. However, if something like failure to hit the carriage return is the explanation then there should have been one or more missing items at the end of the entries for that client and their data would never have made it into the computation of alpha. Perhaps a really badly out of range item at a rate of just 0.008% is enough to bring alpha down this much. Only checking back to the original data will tell. I hope they still have the original data.

OK, but does this merit a blog post (well, I’ve got to start somewhere!) I think there are some points of interest.

  • it shows just how influential a few out of range scores can be
  • it shows that alpha can sometimes detect this and hooray for the people involved that they did calculate alpha and sensed that something was so wrong that they couldn’t just go ahead with the analyses they had planned
  • it does show though that simple range checks on items were a quicker and more certain way of detecting what was at root here
  • it shows that though I think you should always do all the range and coherence checks on data that you can think of making sense for the data …
  • … it’s stronger to have duplicate data entry but which of us can afford this?
  • even if you can do duplicate entry (assuming that the clients complete the measures on paper) you should use a data entry system that as far as possible detects impossible or improbable data at the point of entry
  • (and if you do have direct entry by clients please make sure it does that entry checking and in a user-friendly way)
  • but while absurd sums of money are put into healthcare data systems and into funding psychological therapy RCTs, where is the money to fund good data entry, clinician research and practice based evidence?

To finish on a gritty note about data entry, at least twenty years ago, before I discovered S+ and R I mainly used SPSS for statistics and back then, for a while, SPSS had a “data entry module”. It was slow ,which was perhaps why they dropped it but it was brilliant: you could set up range checks and all the coherence checks you wanted (pregnant male: I think not). After that died I tended to enter my data into spreadsheets and until about a year ago I was encouraging colleagues I work with around the world to use Excel (yes, I tried encouraging them to use Libre/OpenOffice but everyone had and knew Excel and often weren’t allowed to install anything else). They or I would write data checking into the spreadsheets to the extent that Excel allows and I wrote data checking code in R (https://www.r-project.org/) to double check that and to catch things we couldn’t in Excel. I still use that for one huge project but it’s a nightmare: updates of Windoze and seem to break backwards compatibility, M$’s way of handling local character sets seems to create problems, its data checking seems to break easily and I find it almost impossible to lock spreadsheets so that people can enter data but not change anything else. I’m sure that there are Excel magicians who can do better but I’m equally sure there are better alternatives. At the moment, with Dr. Clara Paz in Ecuador, we’re using the open source LimeSurvey survey software hosted on the server that hosts all my sites (thanks to Mythic Beasts for excellent open source based hosting). If you have a host who gives you raw access to the operating system LimeSurvey is pretty easy to install (and I think it runs on nasty closed source systems too!) Its user interface isn’t the easiest but so far we’ve been able to do most things we’ve wanted to with a bit of thought and the main thing is that it’s catching data entry errors at entry and proved totally reliable so far.