|Original article with some marketing claims added|
As a researcher, we should of course know to read beyond that abstract, but a recent Twitter discussion pointed me towards a paper that has a very blatant discrepancy between its sales talk and the article's actual core. The paper appeared to be hard to understand due to a substandard use of statistics and equally bad reporting of the study's methods. This resulted in a "letter-to-the editor" written by Daniel Lakens (@lakens), Stuart Ritchie (@StuartJRitchie) and Keith Laws (@Keith_Laws). The story behind that Letter-to-the-Editor is interesting in its own sake? Be sure to read about it on Daniel's blog.
Below, you'll find a summary of our most important issues with the original paper and a list of other issues that were not included in the letter to the editor. I guess that for many students (irrespective of their field of interest) it would be a good excercise to go through the paper and find at least some of these errors themselves. Indeed, as I already said in the Twitter discussion, the paper would not pass as a Master's thesis (at least, I hope so). This, of course is troublesome with regard to the paper itself, but also concerning the review process that should be able to reject such a write-up.
Shame on the authors, reviewers and editor for sloppy reporting of data. http://t.co/9n2JXJLrV4 HT @Keith_Laws Wouldn't pass as a MA thesis
— Tim Smits (@TimSmitsTim) January 14, 2014
The original paper, by Douglas Turkington and colleagues, was published in the Journal of Nervous and Mental Disease and discussed cognitive behavioral techniques for psychosis patients. The innovation in this study was that the authors tried to show that case managers with little training, rather than certified psychotherapists, could also deliver such a therapy with positive results. The study is "explorative", which means that there is no control group involved to causally test the effects. Rather, the researchers wanted to see whether they found improvements within the test group of patients that were exposed to their case managers' cognitive behavioral techniques.
The core problem is that there are actually too little statistics reported to understand what the data are showing. The important statistics are supposedly summarized in Table 2 (below) but there are quite some issues with them. It is unclear whether the confidence intervals are around the reported effect sizes (as @asehelene assumed in her blogpost on the same paper) or around the actual dependent variables. None of these two options actually seems to fit the reported confidence intervals. You'll find our detailed comments on this in the letter once it is published.
Now, for the extended report of concerns:
-The paper lacks a clear naming of concepts and variables. In the abstract it states “The primary outcome measure was overall symptom burden...” (p.29). It is unclear what reported variable this refers to. Maybe it is the “good clinical outcome” that is mentioned in the results (though it is unclear how this measure was construed). Perhaps it refers to some of the variables (most likely hallucinations and delusions) in the aforementioned Table 2.
-The abstract specifically mentions three effect sizes that were “medium to large”. The reported Cohen d’s range from 0.87 to 1.60. Of course, these are all large effect sizes, raising questions on the accuracy of these values and/or the author’s ability to interpret them.
-The case managers involved in providing the cognitive behavioral techniques are mentioned on page 31. The article refers to Table 1 for further demographics on these case managers. Though it would be nice to have such a Table 1 showing demographics about the case managers (certainly because the authors themselves report great variability in case managers but do not seem to include this variability as a covariate in their analyses), this table is missing. Instead, Table 1 is about the patients.
-The Procedure section has one sentence on the concept of “fidelity” which was not previously introduced in the article. There is no rationale given why it should be included and what role it is assumed to play. However, more than half of the Results section deals with precisely this variable. In those results on the fidelity measure, it appears that its mean score was 31 and this is deemed “comparable” to a previous study where the mean score was 38.84 and in which 30 was judged to be a cut-off for an acceptable intervention. The comparability of these figures should be addressed with appropriate tests if the authors think it is important.
-The authors report they performed t-tests and Wilcoxon signed rank tests but the reader does not get any information on these statistics. Hence, it is also impossible to judge whether the statistics were done in an appropriate way.
-The figures are plentiful though not really contributing that much. Moreover, Figure 5 is missing definitions of the axes making it completely irrelevant. Sometimes, a graph is worth a lot in explaining difficult findings, but for these simple effects it would have been at least as informative to just get some cell means and standard deviations.
-The discussion states that “effect sizes and the symptom response are similar to those reported when psychiatric nurses delivered six sessions of CBT techniques to the patient and three sessions to the main caregiver” (p.32) but there are no statistics given to make this claim.
For a paper that is only four pages long the above is an awkwardly long list of concerns. These concerns only involve the reporting of the study. The data might be fine and maybe we can learn a lot from this explorative study but it seems dangerous to trust the claims made in the abstract given all the above.
|Excerpt from original abstract|