Data entry: two out of range item scores can really affect Cronbach’s alpha

This little saga started over a year ago when I helped at a workshop a psychological therapies department held about how they might improve their use of routine outcome measures. They were using the CORE-OM plus a sensible local measure that added details they wanted and for which they weren’t seeking comparability with other data.

In the lunch break someone told me s/he had CORE-OM data from a piece of work done in another NHS setting (with full research governance approval!) The little team that had put a lot of work into a small pragmatic study felt stymied because the Cronbach alpha for their CORE-OM data was .65 and they were worried that this meant the perhaps the CORE-OM didn’t work well for their often highly disturbed clientèle. They had stopped there but thought of asking me about it.

My reaction was that I shared the concern about self-report measures, not just the CORE-OM, perhaps not having the same psychometrics, not working as well, in severely disturbed client groups as in the less disturbed or non-clinical samples in which they’re usually developed. However, I hadn’t thought that would bring the alpha down that low and wondered if they had forgotten to reverse score the positively cued items.

As everyone’s crazily busy I didn’t hear anything for a long while but then got a message that they had checked and the coding was definitely right, would I have a look at their data in case it really was about the client group as they knew I was interested in how severity, chronicity and type of disturbance may affect clients’ use of measures.

I agreed and received the well anonymised data. About 700 participants had completed all the items and the alpha was .65 (not that I really doubted them, I just like to recheck everything!) So I checked the item score ranges though I hadn’t really thought there was likely to be much by way of data entry errors. There wasn’t: just two out of range items in over 23,000. The one was 11 and the other was 403. Changing them to missing, and hence dropping two participants resulted in an alpha of .93 with a parametric 95% confidence interval from .93 to .94, i.e. absolutely typical for CORE-OM data.

I would never have believed that just 0.008% incorrect items could affect alpha that much, even if one was 403 when the item upper score limit is 4: I was wrong! Well, perhaps it’s not quite that low a percentage. If that 11 was 1 for the one item (item22) and another 1 which should have gone into item 23 then perhaps many of the remaining items for that client were wrong; same for 403 for item 28, after all 1, 4, 0 and 3 are all possible item scores on the CORE-OM. That would take the incorrect entries up to 0.08%. However, if something like failure to hit the carriage return is the explanation then there should have been one or more missing items at the end of the entries for that client and their data would never have made it into the computation of alpha. Perhaps a really badly out of range item at a rate of just 0.008% is enough to bring alpha down this much. Only checking back to the original data will tell. I hope they still have the original data.

OK, but does this merit a blog post (well, I’ve got to start somewhere!) I think there are some points of interest.

  • it shows just how influential a few out of range scores can be
  • it shows that alpha can sometimes detect this and hooray for the people involved that they did calculate alpha and sensed that something was so wrong that they couldn’t just go ahead with the analyses they had planned
  • it does show though that simple range checks on items were a quicker and more certain way of detecting what was at root here
  • it shows that though I think you should always do all the range and coherence checks on data that you can think of making sense for the data …
  • … it’s stronger to have duplicate data entry but which of us can afford this?
  • even if you can do duplicate entry (assuming that the clients complete the measures on paper) you should use a data entry system that as far as possible detects impossible or improbable data at the point of entry
  • (and if you do have direct entry by clients please make sure it does that entry checking and in a user-friendly way)
  • but while absurd sums of money are put into healthcare data systems and into funding psychological therapy RCTs, where is the money to fund good data entry, clinician research and practice based evidence?

To finish on a gritty note about data entry, at least twenty years ago, before I discovered S+ and R I mainly used SPSS for statistics and back then, for a while, SPSS had a “data entry module”. It was slow ,which was perhaps why they dropped it but it was brilliant: you could set up range checks and all the coherence checks you wanted (pregnant male: I think not). After that died I tended to enter my data into spreadsheets and until about a year ago I was encouraging colleagues I work with around the world to use Excel (yes, I tried encouraging them to use Libre/OpenOffice but everyone had and knew Excel and often weren’t allowed to install anything else). They or I would write data checking into the spreadsheets to the extent that Excel allows and I wrote data checking code in R (https://www.r-project.org/) to double check that and to catch things we couldn’t in Excel. I still use that for one huge project but it’s a nightmare: updates of Windoze and seem to break backwards compatibility, M$’s way of handling local character sets seems to create problems, its data checking seems to break easily and I find it almost impossible to lock spreadsheets so that people can enter data but not change anything else. I’m sure that there are Excel magicians who can do better but I’m equally sure there are better alternatives. At the moment, with Dr. Clara Paz in Ecuador, we’re using the open source LimeSurvey survey software hosted on the server that hosts all my sites (thanks to Mythic Beasts for excellent open source based hosting). If you have a host who gives you raw access to the operating system LimeSurvey is pretty easy to install (and I think it runs on nasty closed source systems too!) Its user interface isn’t the easiest but so far we’ve been able to do most things we’ve wanted to with a bit of thought and the main thing is that it’s catching data entry errors at entry and proved totally reliable so far.

Technical notes on moving PSYCTC.org to WordPress

For my own benefit, and in case it’s useful to anyone else, I’m putting a technical blog post here now which I hope will capture the main steps I’ve taken so far.

Location and URLs

I had the existing static HTML site at https://www.psyctc.org/ and I already had a WordPress installation underneath that (my personal site and blog: https://www.psyctc.org/pelerinage2016/ and Mythic Beasts, the excellent service provider who have been supporting my internet presences for years agreed that having one WordPress instance nested in a URL below another was a recipe for trouble so I opted to have the new WordPress instance at https://www.psyctc.org/psyctc/ .  Clearly, that’s a bit clumsy and if you don’t have a reason to do this you can probably simply install WordPress into your URL root.

Over the last five years I’ve had experience with a number of commercial WordPress themes in sites I’ve created and that’s left me rather unimpressed with incompatibilities and poor support so these days I prefer to use WordPress’s own free themes.  For this I’ve gone for Twenty Fifteen as it seemed to fit what I needed and, making my life simpler, it’s the one to which I’m also retheming and rejigging my CORE System Trust site

I’m clear that the first thing anyone should do once they’ve chosen a sensible basic theme is to make a “child theme” so you minimise the risk of losing any customisation you make if the them is updated.  For that I used the Child Theme Configurator Pro plugin. That’s a paid for extension of that company’s free plugin which wasn’t expensive and I felt was worth paying for but you can make a child theme by hand or using their free plugin (and there are other plugins to do this: I don’t claim to have done more than some simple searching which led me to this one but so far I like it a lot).

Appearance customisation

I’ve tweaked the footer to remove “Proudly” from “Proudly powered by WordPress” as I’m very happy to acknowledge my huge debt to the WordPress world but “proudly” just isn’t my language: they’re the ones who can be proud of this, I’m just grateful! To hack that you have to edit footer.php which the child theme plugin makes very easy though it would be very easy to do if you have pretty much any other route to that file.  It’s a small file and easy to understand and to find “Proudly powered” in it and edit that without knowing anything about PHP.

I used the standard customisation options to use the background colour I’d used on a lot of the old pages and used old gifs as identity icons.  In case you wonder what they are: they’re a nod to my proud (yes, “proud”) membership of the Institute of Group Analysis) and I created the gifs so long ago that I haven’t the faintest idea now what tools I used but they would have been done in MS DOS! 

I replaced the old index.html root html file of www.psyctc.org with a very short one that redirects to https://www.psyctc.org/psyctc/.  If you want that code it’s easy to find or you can right click that link above (in most browsers) and grab my file and edit it to point where you want it to (and not here!)  As you can see in my new WordPress home page, I renamed the old index.html file and put a link back to that so that anyone wanting the old site can get it easily.

Plugins

I’m trying not to use more plugins than I need and probably the child theme configurator wasn’t strictly necessary but did help me feel safe and guided through that process and supported other things.  I’ve installed Google Fonts PRO to customise typography as I’ve used it on two other sites recently but so far I don’t think I’ve actually changed any fonts so perhaps that should go.  I’ve used the free plugin Contact Form Email for all my contact forms over the last five years and it’s worked perfectly as far as I can tell and is upgraded rapidly and has a large user base with good reviews so I felt no reason to change from that though I don’t think the settings interface is particularly user friend or intuitive but once you’ve used it a bit that’s no longer an issue.

One thing that’s new for me is to require people to subscribe to post comments on this blog and I’ve used the free Ultimate Member plugin to handle that.  One thing there that took quite a lot of sleuthing to work out was how you force subscribers to validate their Emails.  I think the setting is ridiculously hidden and I had assumed that if you activated the Email template asking people to validate their Email address that requirement would be set automatically.   I found the answer eventually at https://wordpress.org/support/topic/registration-confirmation-email-is-not-sent-2/ you set that it in the in the User  Roles section of the Ultimate Member menu entries in the WordPress administration panel:

It’s that selection window that has to be switched to “Require Email Activation”.

I’ve installed TinyMCE Advanced and did manage to get it to give me control over font size in pages and posts but it’s clearly not working nicely with the new WordPress “Gutenberg” editor which is a pain.  To be brutal, so far I’m pretty unconvinced by the early user experience with Gutenberg but it’s clear that WordPress are committed to it and I think I can see that it will have long term avantages so I’m resisting the temptation to revert to the classical editor.  I hope someone, whether it’s Tiny or WordPress or someone else, ends up putting the sort of controls into the default text editor that TinyMCE Advanced did pretty easily for me in the classical editor.

I added the free Relevanssi search plugin.  That was actually a mistake when I was trying to work out how to add “search” and eventually I settled for making it accessible from “footer” widget menu in the theme.  (I put “footer” in scare quotes as it’s actually not what I consider a footer as located by the theme: this the bottom part of the side menu.)  However, Relevanssi looks good so I’ll stick with it.

I have installed the free Google Analytics for WordPress by MonsterInsights plugin (what a name!) and created a Google analytics account for the site.  I’d love not to use Google but I do sort of know how to do and use this approach so it stays for now but is on my list to drop if I can.

Security and safety plugins

I’m not sure if Akismet is installed by default in WordPress now or was installed by Mythic Beasts when they did the WP installation for me (I’m getting lazy!) but I have activated it though I’m requiring subscription before posting a comment and using Email subscription validation so it’s going to be a pretty determined real live spammer trying to post comments so perhaps I don’t need it.

As already noted, I’m using the free Ultimate Member plugin to manage subscriptions.  The installation had the free Login LockDown plugin installated, again I’m not sure if that comes by default or was added by Mythic Beasts but it looks sensible “by restricting the rate at which failed logins can be re-attempted from a given IP range” which sounds very sensible so that’s staying.  I’ve added UpdraftPlus which I use for backups of all my WordPress sites.  I use the paid for professional version and their cloud vault: not cheap but feels wise.

Finally, the free WP Super Cache plugin was installed and I’ve had that in all my WP sites and think it really does speed things up (but when you’re making configuration changes and they don’t seem to be working, always switch it off or flush its cache and retry before assuming you’re doing something wrong: I wasted fifteen minutes or so changing the names of the widgets and seeing no change in them before I remembered that one!) I love the company’s map of its employees!

Damn, thought I was finished for now and just remembered that I need to add a privacy statement. Enough here for now!

 

 

Moving PSYCTC.org to WordPress

Rather frighteningly (for me at least) the PSYCTC.org site is at least 23 years old.  Its origins were in some pages I created when I worked at St. George’s Hospital Medical School and it’s moved across umpteen hardware platforms since then.  Much of it is raw HTML files I created over the years, probably mostly with emacs on Linux systems. That meant it was light and fast when that really mattered as people, including me editing it, might be using dial up modems to get to it.  I’m not sure there’s anywhere in the world now where people will be accessing things that way and it’s meant that changing things was not easy.  I’ve finally decided to protect a bit of time to shift the site to WordPress which, though it has many faults to my mind, does make maintaining and developing a site easier than coding it in raw HTML!  I’ll never have the eye, skills or time to make things look beautiful but at least a minimally modified WordPress theme will look better than anything I can hand craft.  I will use this blog space to share technical challenges I hit with the site and generally with using WordPress where I think I found something that might be useful to others, however, it’ll be mainly for thoughts about psychotherapy and counselling research and particularly my fascinations with psychometrics, with what I call “rigorous idiography” and with methodology pretty generally.