It’s what it says and I think a recent movement not to separate the two, i.e. to keep an emphasis on integrity not just on fraud, is sound.
Details #
So this is about the increasing recognition that a worryingly high proportion of the findings reported in the research literature, seemingly any research literature, are fraudulent and shouldn’t be there, i.e. that the integrity of the research world is not what we want it to be, not what we need it to be if we are to make recommendations for interventions based on reviewing the literature.
This ranges from simply making up case reports or data. At an early stage in my career I co-authored McCluskey, Evans, Lacey, Pearce & Jacobs (1991). Polycystic ovary syndrome and bulimia. Fertility and Sterility, 55, 287–291. Dr. Pearce had done the classification of the ovarian ultrasounds scans and he was regarded as a real expert in that work. Later it transpired that a case report he published, with other co-authors, in which he claimed to have successfully reimplanted an ectopic fetus was a complete fabrication. The General Medical Council struck him off the UK medical register and a co-author of that paper, his head of department who had accepted “gift authorship” on that paper had to resign and retire. We were then subjected to a very scary, perhaps unnecessarily persecutory “guilty unless you can prove innocent”, investigation of our paper. So too were all co-authors of all papers he had ever co-authored. Fortunately we had kept good records and had used good practice to ensure that he was completely blind to the group membership of the women whose ovarian scans he was classifying. That meant our paper did not have to be retracted and our careers weren’t damaged. It was a salutary experience and underlined for me the importance of meticulous record keeping and optimal research design (where optimal design is possible).
That’s just a personal story of one extreme of fraud: pretty blatant fabrication. There are many other such stories and one thing that has shocked me since then is how clear it is that not all institutions are as thorough as ours (then St. George’s Hospital Medical School) was in investigating all the work by researchers who are proven to have fabricated findings. Another depressing fact is that less grandiose claims that may be fabricated are much less likely to be detected than a claim to have done something that has never been done before. (I confess that grandiose personal claims, and grandiose personal style in the presentation of work, are now red flags for me.)
One nice paper comes at the issue of possible fabrication for quantitative research reporting p values: Nuijten, Hartgerink, van Assen, Epskamp & Wicherts (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2. To quote their abstract in full:
This study documents reporting errors in a sample
Nuijten et al. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2
of over 250,000 p-values reported in eight major psychology
journals from 1985 until 2013, using the new R package
‘statcheck’. statcheck retrieved null-hypothesis significance
testing (NHST) results from over half of the articles from this
period. In line with earlier research, we found that half of all
published psychology papers that use NHST contained at least
one p-value that was inconsistent with its test statistic and
degrees of freedom. One in eight papers contained a grossly
inconsistent p-value that may have affected the statistical con-
clusion. In contrast to earlier findings, we found that the aver-
age prevalence of inconsistent p-values has been stable over
the years or has declined. The prevalence of gross inconsis-
tencies was higher in p-values reported as significant than in
p-values reported as nonsignificant. This could indicate a system-
atic bias in favor of significant results. Possible solutions for the
high prevalence of reporting inconsistencies could be to encour-
age sharing data, to let co-authors check results in a so-called
Bco-pilot model,^ and to use statcheck to flag possible inconsis-
tencies in one’s own manuscript or during the review process.
Back in 2017, after reading that paper, I ran statcheck through all the copies of papers I have in my zotero library and found few impossible p values (and was relieved to find none in any of my papers). Maybe I should be doing that on every paper I read that quotes p values. I guess I’m sceptical enough about the p-value/NHST paradigm that that wouldn’t involve that many papers but I think it’s still more than I have time for doing that. (Say an additional ten minutes per paper read.) Sadly, the wise shift from NHST towards confidence intervals has a side effect of making statcheck’s superb ability to check many hypothesis tests is getting less useful as at least a minority of papers move away from the NHST approach to quantitative data.
There are many other pieces of work that show that research integrity isn’t what we need it to be and there are many ex cathedra pronouncements in journals about how the pressures on researchers to publish to keep their increasingly precarious jobs are exacerbating this situation. Sadly, I see no evidence that the obscene profits the academic publishing industry make from the situation, and the equally obscene managerialist, industrial model that now runs universities, have any genuine investment in changing the situation.
I think “Reader beware”, holding our own integrity and trying to do all we can to promote are the only small protections we have against the problem of research fraud.
Try also #
Hm, I have a feeling I am going to be expanding this area.
Confidence intervals (CIs)
Null hypothesis significance testing (NHST) paradigm
Chapters #
Perhaps wrongly, we didn’t touch on this in the OMbook.
Online resources #
None currently.
Dates #
First created 13.viii.24.