Tuesday, May 20, 2014

A science preregistration revolution? Not yet.

Within a short span of only 24 hours or so, I received two opposing types of news concerning preregistered science. Its basic idea is that with good preregistration of researchers' intents, science would be more objective by reducing researcher degrees of freedom (e.g., HARKing: Hypothesising After Results are Known; p-hacking: using different analyses, data, variables until the hypothesized effect is found).
  • Yesterday, May 19 2014, the journal Social Psychology published an issue stacked with preregistered replication studies. Not all of these replications were (fully) successful, including our own attempt to replicate the idea that a single exposure to music and an object would transfer your attitude about the music to that object (i.e., Gorn’s 1982 concept of single exposure musical conditioning). If you want to read more about this issue and the preregistration idea in (social) psychology, I would suggest Chris Chambers’ blog in the Guardian. For a more critical voice, John Bohannon expressed the idea (or fear) that preregistration relates to academic bullying.
Though I participated in the preregistration replication effort for Social Psychology I am still in doubt about the exact status of preregistration. I believe it is a good and even necessary tool for honest replication attempts. I am not sure whether we should use it for all of our research, as long as we are sure to refrain from HARKing our exploratory analyses. In any case, I think we should move on in honest reporting. Preregistration will only foster scientific integrity if we link it with specific reporting guidelines. Sure, a researcher can also do exploratory analyses that were not preregistered. But we should not conceal them with persuasive wording suggesting that all was planned (cf. HARKing, p-hacking)

"Preregistration will only foster scientific integrity if we link it with specific reporting guidelines"

Sadly, this is not an opinion everybody shares…
  • The Lancet finally gave me an editorial decision for the Letter I submitted, addressing precisely the lack of good preregistration and the persuasive framing used by authors to cover up their HARKing. See an earlier blogpost for the (astonishing) details of this. Lancet decided not to publish the submitted commentary. Of course I can understand that all kinds of editorial choices have to be made, but I did not receive a true explanation for the rejection. I sure hope they do acknowledge the seriousness of the issue I tried to address.  

Above, I already talked about HARKing and p-hacking as a serious threat to science integrity. For the non-informed: compare them to a sports game where the rules change. Claiming a soccer victory not because you scored more goals but because you had higher ball possession percentages increases your degrees of freedom to a non-acceptable extent. Your degrees of freedom are limited meaning that when one team wins the game, we understand that this could be due to different facts but at least their is something common to all wins, namely scoring more goals. In (empirical, quantitative) science, an implicit set of (bad) rules has long been dominating "the game". Problem is that science does not pretend to be a mere game. So we should have good rules on interpretation of findings. The problem with ill-executed preregistration of research is that researchers can start pretending to follow the rule book while, in fact, they don't.

That is a serious threat to the preregistration idea – as demonstrated in the Morrison article that I discussed in that commentary that was rejected by Lancet. Such ill-suited practice conceals the researcher degrees of freedom and concealing is worse than not caring about it, I believe.

Submitted, but rejected commentary

References for the commentary:
1 Morrison AP, Turkington D, Pyle M, Spencer H, Brabban A, Dunn G, Christodoulides T, Dudley R, Chapman N, Callcott P, Grace T, Lumley V, Drage L, Tully S, Irving K, Cummings A, Byrne R, Davies LM, Hutton P. Cognitive therapy for people with schizofrenia spectrum disorders not taking antipsychotic drugs: a single-blind randomised controlled trial. Lancet 2014 
3 Morrison AP, Wardle M, Hutton P, Davies L, Dunn G, Brabban A, Byrne R, Drage L, Spencer H, Turkington D. Assessing Cognitive Therapy Instead Of Neuroleptics: Rationale, study design and sample characteristics of the ACTION trial. Psychosis 2013; 5(1): 82-92.
4 Schulz KF, Altman DG, Moher D. Protocols, probity, and publication. Lancet 2009; 373: 1524.
5 Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E. reducing waste from incomplete or unusable reports of biomedical research. Lancet 2014; 383: 267-76.

Friday, March 14, 2014

The pitfalls of pre-registration: The Morrison et al CBT paper

There has been quite a fuss about the recently published (online prepublication) Lancet paper on unmedicated cognitive therapy for schizophrenia patients (see below for a reading list of posts about the study). This paper reports on a longitudinal (9 to 18 months) follow-up study of a treatment group and a control group and its basic claim is that the cognitive therapy treatment group fares better than the control group. Though I contributed to a letter to the editor on a study in the same domain that had tremendous reporting flaws, this research is well outside of my discipline. Still, the criticism on the Lancet paper sparkled my interest in it and even for an outsider, there are strange things going on with said manuscript. In this post I will focus on the peculiar timeline of things that shows one of the greatest pitfalls of scientific pre-registration. Though I am greatly in favor of preregistration for a lot of studies, I do think that the mass application of preregistration might lead to new problems. One evident problem is that merely refering to a preregistered study protocol does not imply that the protocol was sensible or that the researchers actually followed it.

The summary
It took quite some investigative academic journalism, I think, but the summary of the study's timeline is presented below. Plain grey timeline moments are based on assumptions following the different pieces of evidence that are available; those with a colored border are based on one of the four public sources.
Timeline - click to enlarge

Friday, February 21, 2014

Facebook's Whatsapp deal might all be about usability

OK, so Facebook bought Whatsapp for an incredible amount of money. Two reasons have been hypothesized as to the why of that deal: money or data.

Money? The figures do not add. With 450 billion of users paying 1 dollar per year, you're having a very long term goal or sky-high ambitions of increasing the user base.
Data? Sure there is a lot of data involved in Whatsapp. But, seriously, 35$ per user for additional data to use and sell to advertisers? That is a lot of money. Facebook already has a lot of data. They could better spend that money on an army of statisticians and save the other 18.9 billion for another deal.

So, what could be a reason? Usability or, in other words, the attractiveness of Facebook as a sub-internet. Due to its own succes, Facebook has become a massive network crowded with information. The expansion of the social medium called Facebook is gradually reaching a point where you can say that is has become an entire subsystem much like how http once was only a part of "the internet" (next to usenet, FTP, e-mail) but than gradually became a pars pro toto. This is all fine if you are looking at the business figures, but it has a serious drawback for the end user. The stream of information is now so immense that Facebook has to ration the timeline (well, they make money out of it as well, of course, with payed distribution of updates) and that many users finds themselves overwhelmed. This decreases the attractivity for users to spend time on Facebook and it partly explains the unfollowing of friends.

Whatsapp is a perfect solution for that. Integrating it in Facebook will install a sublayer that contains only proximal facebook friends. Indeed, your average contact list on Whatsapp will likely be much smaller than the number of Facebook Friends. It is your closer social inner circle.

Now, imagine a mobile Facebook app layout where Whatsapp is a card within the app. It will contain short status updates of your proximal inner circle. You'll be very interested in that part of the app and engaging a lot with it. Around that info will be your more distant Facebook circle, possibly with brand related posts and Facebook ads for monetization. And you can even imagine regular adds around that one as a third circle. Building the platform as such will ensure a continued interest in the use  of Facebook as a whole, due to the interest in proximal circle of contacts. There are two assumptions here. First, Facebook should try to be the best alternative for both the Whatsapp sub-app and for the overall network experience. Second, people should still be willing to buy smartphones the size of a 7" tablet because a card can only be fitted within an app on such a screen size or larger.
If the above is the way Facebook will integrate Whatsapp, than it is actually a very smart move by Facebook to build there own Google+ based on actual consumer uses. Google+ already has the functionality of different circles, it is closely linked with sharing of pictures and videos (cf. the recent Youtube integration) and it has its messaging/calling application with hangouts. The major drawback of Google+ is that Google built a very strong communication/social platform without consumer usage. I think, Facebook is moving to a very similar platform, but are buying the consumer usage and then make it a platform.

Monday, February 17, 2014

Don't get all psychotic on this paper: Had I (or we) Been A Reviewer (HIBAR)

Academic publishing sometimes displays a strange parallel with advertising. There's the celebrity endorsement by reknown authors, a credibility inference evoked by the academic journal and the assumed thorough peer review, and the sales talk in the paper's abstract.
Original article with some marketing claims added

As a researcher, we should of course know to read beyond that abstract, but a recent Twitter discussion pointed me towards a paper that has a very blatant discrepancy between its sales talk and the article's actual core. The paper appeared to be hard to understand due to a substandard use of statistics and equally bad reporting of the study's methods. This resulted in a "letter-to-the editor" written by Daniel Lakens (@lakens), Stuart Ritchie (@StuartJRitchie) and Keith Laws (@Keith_Laws). The story behind that Letter-to-the-Editor is interesting in its own sake? Be sure to read about it on Daniel's blog.

Below, you'll find a summary of our most important issues with the original paper and a list of other issues that were not included in the letter to the editor. I guess that for many students (irrespective of their field of interest) it would be a good excercise to go through the paper and find at least some of these errors themselves. Indeed, as I already said in the Twitter discussion, the paper would not pass as a Master's thesis (at least, I hope so). This, of course is troublesome with regard to the paper itself, but also concerning the review process that should be able to reject such a write-up.

Tuesday, February 11, 2014

"A/B testing"? That's just slang for "experimental research".

A/B testing originated in advertising as a way to tell which of two types of communications (like two versions of an advertisement) where most effective in driving sales. Since the seminal book by Hopkins on "Scientific Advertising" (1923), the technique has been used in retail as well (e.g. in deciding on effective store design) and most recently in the design of websites. It is with regard to the latter that the term has rapidly gained even more attention and many (mostly) commercial websites now regularly do A/B testing to test certain design options (e.g. should one use the words "Buy Now" or "Order" for the check-out box). Some refer to such a website design test with "split test" because the test actually splits the visitors of the website in different groups, each seeing a different version of the website, and monitors key performance indicators such as check-outs, likes, or clicks.

Why is this just the same as experimental research? The gist of A/B testing is that you manipulate at least one aspect (e.g., call-to-action, color, piece of text, positioning, ...) of your website. Experimental research would say that aspect is your independent variable. To do so, you have to design at least two versions of that website, each having a different value on the design variable you want to test (e.g., "buy now" vs "order" call-to-action). Each instance of the website could be called a condition. You then randomize the visitors to the website such that you can observe the visitors' behavior in each. This observation is what one would label the dependent measure as one thinks its value will at least partly depend on the value of the manipulated design variable. In sum, this is the description of the most basic experiment (and of the majority of A/B tests).

You see, the basics for the two are the same. They even share a bit of history. Though experimental research is older than A/B testing is, the first landmark publication is from the same era as Hopkins' book: Fisher's "The Arrangement of Field Experiments" dates back to 1926. Despite that similarity, I do believe that current A/B testing has not yet benefitted from the long tradition and theoretical advancements academics have made since then. Experimentation has been a core research method in agriculture, biomedicine, psychology, communication, etc. This resulted in crucial insights in the design, execution, measurement, and analysis of experiments. If these advances are being applied in A/B testing than they are surely not wildly shared.
(I will follow up on this post with one or more posts on specific experimentation insights that could be worthwhile from an A/B testing perspective)

Why does the above actually matter?
For students it implies that there might be a lot of transferable skills hiding in your courses. Try to understand experimental research, its subtleties, and the creativity involved with it. You might find it even more relevant after graduation when confronted with your own instance of A/B testing whatever content you will be producing professionally. For testers it implies that tests could benefit a lot from the extensive academic tradition of experimental research.

(Disclaimer: A great post by @peeplaja inspired me for this post. Thanks to @ElienDJ for tweeting about that post).