Tuesday, May 20, 2014

A science preregistration revolution? Not yet.

Within a short span of only 24 hours or so, I received two opposing types of news concerning preregistered science. Its basic idea is that with good preregistration of researchers' intents, science would be more objective by reducing researcher degrees of freedom (e.g., HARKing: Hypothesising After Results are Known; p-hacking: using different analyses, data, variables until the hypothesized effect is found).
  • Yesterday, May 19 2014, the journal Social Psychology published an issue stacked with preregistered replication studies. Not all of these replications were (fully) successful, including our own attempt to replicate the idea that a single exposure to music and an object would transfer your attitude about the music to that object (i.e., Gorn’s 1982 concept of single exposure musical conditioning). If you want to read more about this issue and the preregistration idea in (social) psychology, I would suggest Chris Chambers’ blog in the Guardian. For a more critical voice, John Bohannon expressed the idea (or fear) that preregistration relates to academic bullying.
Though I participated in the preregistration replication effort for Social Psychology I am still in doubt about the exact status of preregistration. I believe it is a good and even necessary tool for honest replication attempts. I am not sure whether we should use it for all of our research, as long as we are sure to refrain from HARKing our exploratory analyses. In any case, I think we should move on in honest reporting. Preregistration will only foster scientific integrity if we link it with specific reporting guidelines. Sure, a researcher can also do exploratory analyses that were not preregistered. But we should not conceal them with persuasive wording suggesting that all was planned (cf. HARKing, p-hacking)

"Preregistration will only foster scientific integrity if we link it with specific reporting guidelines"

Sadly, this is not an opinion everybody shares…
  • The Lancet finally gave me an editorial decision for the Letter I submitted, addressing precisely the lack of good preregistration and the persuasive framing used by authors to cover up their HARKing. See an earlier blogpost for the (astonishing) details of this. Lancet decided not to publish the submitted commentary. Of course I can understand that all kinds of editorial choices have to be made, but I did not receive a true explanation for the rejection. I sure hope they do acknowledge the seriousness of the issue I tried to address.  

Above, I already talked about HARKing and p-hacking as a serious threat to science integrity. For the non-informed: compare them to a sports game where the rules change. Claiming a soccer victory not because you scored more goals but because you had higher ball possession percentages increases your degrees of freedom to a non-acceptable extent. Your degrees of freedom are limited meaning that when one team wins the game, we understand that this could be due to different facts but at least their is something common to all wins, namely scoring more goals. In (empirical, quantitative) science, an implicit set of (bad) rules has long been dominating "the game". Problem is that science does not pretend to be a mere game. So we should have good rules on interpretation of findings. The problem with ill-executed preregistration of research is that researchers can start pretending to follow the rule book while, in fact, they don't.

That is a serious threat to the preregistration idea – as demonstrated in the Morrison article that I discussed in that commentary that was rejected by Lancet. Such ill-suited practice conceals the researcher degrees of freedom and concealing is worse than not caring about it, I believe.

Submitted, but rejected commentary

References for the commentary:
1 Morrison AP, Turkington D, Pyle M, Spencer H, Brabban A, Dunn G, Christodoulides T, Dudley R, Chapman N, Callcott P, Grace T, Lumley V, Drage L, Tully S, Irving K, Cummings A, Byrne R, Davies LM, Hutton P. Cognitive therapy for people with schizofrenia spectrum disorders not taking antipsychotic drugs: a single-blind randomised controlled trial. Lancet 2014 
3 Morrison AP, Wardle M, Hutton P, Davies L, Dunn G, Brabban A, Byrne R, Drage L, Spencer H, Turkington D. Assessing Cognitive Therapy Instead Of Neuroleptics: Rationale, study design and sample characteristics of the ACTION trial. Psychosis 2013; 5(1): 82-92.
4 Schulz KF, Altman DG, Moher D. Protocols, probity, and publication. Lancet 2009; 373: 1524.
5 Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E. reducing waste from incomplete or unusable reports of biomedical research. Lancet 2014; 383: 267-76.


  1. Hi Tim, fair comment. I completely agree that extremely vague (or post-dated) pre-registration is useless - and potentially worse than useless because it creates an illusion of credibility without actually earning it. But that's not a concern for journal-based pre-registration (i.e. Registered Reports) because the protocol is peer reviewed and there is editorial continuity between the protocol and final submission. This mechanism can work just as well for novel research as for replications. I'm sure you're already aware of this, but for readers of your blog, we wrote a detailed piece on it here: http://orca.cf.ac.uk/59475/1/AN2.pdf

  2. Hi Chris, thanks for your comment. I love the article you linked (and its graph that visualizes p-hacking and harking in the science process)!
    Still, I am a bit pessimistic about the odds of canceling out these practices in registered reports. Imagine a world in which registered reports have scaled to a major publishing format. The editorial task of checking both the registration and the final manuscript will be very time consuming. Thus, an author could quite possibly still sell something as a conceptually preregistered analysis. Careful wording could draw a fine line between exploratory versus confirmatory analyses. One way to deal with this is the demand a particular disclaimer or phrase.
    We should do this, I believe, because practices in Medicine proved us that authors will keep using the degrees of freedom they have at their disposal. This was already documented in 2004 (http://jama.jamanetwork.com/article.aspx?articleid=198809) and my recent experience with the Lancet paper proves that a decade later it is still true.

  3. Also, published protocols can just be ignored or abandoned for little to no reason.

    I wrote this comment elsewhere about another trials that was published in the Lancet, and it seems relevant to this blog post so I thought I'd re-post it here if that’s okay. The full comment and references can be found at: http://www.bmj.com/content/347/bmj.f5963/rr/674255

    The PACE trial's published protocol defined 'recovery' as requiring an SF-36 Physical Functioning (SF36-PF) questionnaire score of at least 85 out of 100, while the trial's entry criteria required a score of 65 or under, which was taken to indicate that patients' fatigue was disabling[2]. The post-hoc criteria for recovery allowed patients with an SF36-PF score of 60 to be classed as recovered. This change was justified by the claim that a threshold of 85 would mean “approximately half the general working age population would fall outside the normal range.”[3] In fact, the data cited showed that the median score for the working age population was 100, less than 18% of the general working age population had a score under 85, and 15% had declared a long-term health problem[4,5].

    An SF36-PF score of 60 was claimed in the Lancet PACE paper to be the mean -1sd of the working age population, and thus a suitable threshold for ‘normal’ disability[6]. They had in fact used data which included all those aged over 65, reducing the mean physical function score and increasing the SD[4]. For the working age population the mean -1sd was over 70, requiring patients to score at least 75 to fall within this ‘normal range’[5]. Also, the trial's protocol makes it clear that the thresholds for recovery (including ≥85 for SF-36 PF) were intended to be more demanding than those for the mean -1sd, reporting that: “A score of 70 is about one standard deviation below the mean... for the UK adult population”[2].

    The post-hoc criteria for recovery so clearly overlapped with the trial's own criteria for severe and disabling fatigue that an additional element came into play, mandating that ‘recovered’ patients not also fulfil every aspect of the trial's criteria for CFS[3]. Even so, patients could still have been classed as recovered when reporting no change, or even a decline, in either one of the trial’s primary outcomes.

    Even using the loose post-hoc criteria for recovery, only 22% of patients were classed as recovered following treatment with specialist medical care and additional CBT or GET[3]. Regardless, the BMJ had reported that PACE showed CBT and GET “cured” 30% and 28% of patients respectively[7], a Lancet commentary claimed that about 30% recovered using a “strict criterion” for recovery[8], and a paper aimed at NHS commissioners stated PACE indicated a recovery rate of 30-40% for CBT and GET[9,10]. It is wrong for such misstatements of fact to be allowed to go on affecting how doctors treat their patients, how funding decisions are made, and the information that patients are provided with before deciding whether to consent to particular interventions.

    The changes to the outcome measures used in the PACE trial may not be “representative of a hidden agenda”[1], but they were misguided, justified by inaccurate claims, and have been misleading to others. The refusal to allow patients access to data on the outcome measures laid out in the trial’s protocol reflects a sad dismissal of their right to be informed about the medical treatments they are being encouraged to pursue[11,12,13].