Why are psychologists becoming defensive about poor reproducibility?

Last week, the reproducibility project published their work attempting to replicate 100 recent findings published in some top psychological journals. The general flavor of the findings and subsequent media coverage is that all of psychological research is untrustworthy, and there has been a fair bit of hand-wringing and hand-waving as a result. What can we take from this?

All psychological research is untrustworthy

This isn’t true at all. Only about 65% of the research appears untrustworthy…

What conclusions can we draw from the studies included in the reproducibility project?

Nosek et al. had the resources to run 100 studies. That’s a lot of data, and gives us the potential to learn a lot about something. What should that something be?

One possibility is that if we were really interested in knowing about a given effect in some study, then we could spend the resources for all 100 studies replicating that effect. Looking at the distribution of the effect across each of these replications would allow us to draw some pretty good conclusions about what the real size of the effect is. However, Nosek et al. decided to do something a little different than this.

Rather than run the same study 100 times, they picked 100 studies, and ran each of them once. This, instead, tells us something about published psychological research in general. In general, we can say that it seems like we have a problem with reproducibility. In general, we can say that published research probably overestimates effect sizes. In general, we can say that because both of these former things are true, then any study that was not replicated in this batch should receive a little extra scrutiny.

However, this does not mean that because one of these studies did not replicate, that it is not replicable, or that the effect doesn’t exist. It simply means that if we should adjust our beliefs about these effects. Specifically, if a some previous finding was published, but now it didn’t replicate, then maybe we should suspect that the true effect is a lot smaller than it was in the original report.

This gets at a larger issue - this idea that effects are binary and can either exist or not exist. Humans are complicated, and there’s no way that the null hypothesis is ever strictly true. Instead, we should be concerned with whether the effect is big enough to care about/whether we’ve got enough resources to detect it. Running two studies doesn’t say anything about any given phenomena. I mean, of course two studies say more than one study, which says more than no studies, but if they’re both reasonably well executed, then the most we can say is that our true knowledge of the effect is going to be somewhere in between what our two results were.

Going back to the replicability work, it seems very likely that we should not weight the two instances of each study (the original and the replication) equally. Indeed, we should probably give a bit more weight to the replication, since, as we now know (or at least, as we now should know), there are a whole bunch of selection effects acting on the initial publication, whereas the replications do not share these problems in any way that I’m aware of.

What do we think about psychology now?

Lisa Feldman Barrett recently suggested that psychology is not in a crisis. She’s half right. First, this is not a problem unique to psychology - the ability to cherry pick findings combined with too much analytic flexibility exists in lots of fields. And second, this is only a crisis if the Nosek et al. paper doesn’t change how we conduct and communicate our science. Changes need to come from and happen in a lot of places, but change is needed.

Science is hard. Like, really hard. We often don’t know that we’re making mistakes (and we always do) until after the fact. That’s kind of unfortunate, but it’s part of the human condition, so it seems like we’re stuck with it. We’re going to do bad science sometimes, and we’re almost always going to have some real problems in trying to figure out the Truth.

But you know what? That’s okay! It should be hard! People are complicated creatures! If it wasn’t hard, then there would be very little reason to do this work - we could just look around and see the Truth in an obvious way. The problem comes in when people take this really difficult, complicated pursuit of the Truth, and they brush away the complicated bits in the interest of creating cute stories about some finding. Sometimes it’s forgivable to do this in the context of writing a popular science book, or if it’s a piece intended for the general public, but there’s absolutely no excuse for this to happen in our scientific papers. There are many strong incentives for scientists to create these stories in their papers - it leads to better press coverage, more grant money, more publications (even other scientists like nice stories), better and more prestigious jobs, more recognition from your peers, and all the other things that people usually want out of their professional lives.

What’s a researcher to do?

Make sure you still enjoy analyzing data and thinking about complicated problems. If you’re into that, then keep at it. If you’re not, then you should find a different job.
In general, be critical and disbelieving of other people’s findings. In fact,
be critical and disbelieving of your own findings.
If you, like me, came into this field (or any other) because you were excited about a set of findings that turned out to be wrong, or overstated, don’t fret too much. There are a lot of interesting things to study, and they’re all complicated and exciting and stimulating. Let this be a lesson to not fall in love with a finding, because they will break your heart.
Do your work in a way that doesn’t contribute to the problem. Preregister your studies, post your data and code, focus on estimating effects and quantifying uncertainty. Don’t focus on the p-value. Don’t be afraid to submit findings that are unclear or messy or confused because that’s the way science works and that’s what people are.

If you’re a student and you follow these things, then it could very well turn out that you have a more difficult time finding an academic position than if you hadn’t. I think things are changing, but change comes slowly. It would be a bummer to not get the job you want, but if you have to sacrifice what you know to be good work in order to get that job, then is it really worth it? Ultimately, I think that it doesn’t much matter where you do your work. Academia, industry, government, and non-profits all have their perks and downsides. If you’re generally happy with what you do, you’ll probably find a way to be happy wherever you land. If you’re generally miserable, then you’ll find a way to be miserable.

What should Psychology do?

Aside from the obvious (reconsider research and publication practices), I think that if we just stick to this defensive response, then we’re really blowing an opportunity here. Other fields are plagued with these problems too. Is it really so hard to say that sometimes we don’t know what we’re doing, and that figuring out humans is tricky business? Psychology could really take the lead in tackling some of this troubling business in the way people conduct science. Let’s collectively agree that it doesn’t do any good to get defensive over our work, okay?

A good chunk of what’s going on here is about human behavior. How about we spend a little bit of time studying how people do science? That seems like a pretty important question, right? What about thinking about how we train scientists? I’m sure there are some people who are concerned with these questions, but they’re certainly less visible than the kind of stuff that makes it into the pop-science books.

The summary

Psychological studies don’t always replicate. If you’re a psychologist and that surprises you, then you haven’t been paying attention.

We can spend all day talking about why something doesn’t replicate, and finding reasons for why we saw an effect here but not there, but that’s missing the point. The real message here is that it’s time to adjust our perception of what good science is. Newsworthy = surprising = less believable. Good science is not about how exciting the story is, but about how convinced you are by the work. Sometimes good, careful, and believable work will create good stories, but the focus should always be on the former.

Finally, let’s not be defensive about the difficulty of our work. Let’s be forthright. We’re doing the best we can, and we will continually do better.

Written on September 4, 2015