Excess Success – Research Practices in Psychology
A new paper adds to the continuing discussion of research practices in psychology. The paper (citation below), in press at Psychonomic Bulletin and Review by Gregory Francis, analyzes the last several years of published papers in Psychological Science, a premier outlet in psychology, and in essence asks if there is “too much success” in the reported studies.
The analysis uses the “test for excess significance” (TES) (Ioannidis & Trikalinos, 2007). The intuition is that if you run a number of experiments – N – measuring an effect of a certain size, then it is possible to compute how likely getting N rejections of the null is for all N experiments. So, if the odds of getting an effect are, say, one in three, given the power to find the effect, then the chances of getting two such effects is the product of this probability, or one in nine. If one finds more successful rejections of the null than one would expect, given the power to reject the null, this suggests that something is amiss. From the analysis alone one can’t say where the excess success comes from, only that there is a bias in favor of positive results. According to Francis, the cutoff for the value of the TES is 0.1. As he puts it: “A power probability below this criterion suggests that the experiment set should be considered biased, and the results and conclusions are treated with skepticism.”
He ran the TES on published papers in Psychological Science between the years 2009 and 2012 (inclusive) that reported four studies or more – the minimum number for the TES analysis – and found that 82% of the 44 paper that met the inclusion criteria had values less than the cutoff value, suggesting a substantial degree of “excess success” in the journal.
I’m confident that the paper will stimulate a great deal of discussion. My interest for the remainder of this post is in a possible pattern in the TES data. When I first read the paper, my eye was caught (slightly ironically) by the short title of one of the papers investigated, The Wolfpack Effect, by my friend Brian Scholl and colleagues, which I wrote a little post about around the time it came out. This paper was one of the eight that surpassed the .1 threshold.
I looked a bit more closely into some of the others that similarly had TES values above .1. The largest TES value, .426, was also in the area of perception, looking at how people can quickly assign stimuli to categories (e.g., “animal”). The next largest TES value, .348, was another perceptual study, having to do with the way that objects are represented in the visual system. Two other papers had to do with, first, another effect in vision – how the color of light affects visual processing of fear-inducing stimuli – and, second, an effect in audition.
So five of the eight successes, as indexed by TES, are from the field of perception. The other three were not, having to do with predictors of subjective well-being, reducing prejudice, and appreciation of others’ help. One paper in the area of perception – about visual rivalry – didn’t fare as well. Neither did a paper looking at the possibility that people see objects they want as being closer to them.
So perception didn’t run the table, but, still, without looking very closely at all the papers in question, it seemed to me that the low-to-medium level perception work distinguished itself in the analyses. (I might add that another paper I posted about, didn’t do as well as the Scholl work.) The balance of the papers covered a fairly wide range of topics. To take just two to illustrate, one paper (TES = .041) presented six studies that purported to show that “[h]andling money (compared with handling paper) reduced distress over social exclusion and diminished the physical pain of immersion in hot water.” A second paper (TES = .036) purported to show that when “religious themes were made implicitly salient, people exercised greater self-control, which, in turn, augmented their ability to make decisions in a number of behavioral domains that are theoretically relevant to both major religions and humans’ evolutionary success.”
In any case, from the results that Francis reports, I don’t think any strong inferences can be drawn. To my eye, it looks like perceptual work does better than the other areas, but more systematic work will need to be done.
It seems to me that it’s worth knowing if some subfields score better in this respect because it speaks to the explanation for the problems. As Francis puts it: “Unless there is a flaw in the TES analysis…there are two broad explanations for how there could be such a high rate of apparent bias among the articles in Psychological Science: malfeasance or ignorance.” It doesn’t seem to me that there’s any reason to think that people in perception are any more ethical than people in other areas. If that’s true – though of course it might not be – then the place to look for the source of the problem is not in malfeasance.
Are there other candidate explanations? Could there be fewer opportunities for researcher degrees of freedom in perception? Could it have to do with the nature of theories in perception, compared to other areas?
I’m not really sure. But it could be that finding patterns in different areas of psychology might be useful for determining the sorts of best practices that will ameliorate these sorts of issues. My guess is that this paper will stimulate many profitable conversations.
Francis, G. (in press). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin and Review.
Ioannidis, J. P. A., & Trikalinos T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4, 245-253.