The Reproducibility Crisis of Psychology and What It Is Trying to Tell Us

By Doug Marman

Over the last few years, a raging crisis has hit the field of psychology: Most published studies can’t be replicated by others. For example, 100 experiments published by highly respected psychology journals were recently tested and only 36% produced results in agreement with the original reports.[1] This is called the “reproducibility crisis.”

It’s a complicated problem. It isn’t caused by fraud, except in rare cases. Many factors are involved, as explained by this article. For example, designing psychology experiments is more difficult than it sounds, and drawing conclusions often involves complex statistical analysis. Even the experiments aimed at reproducing experiments have been found wanting.[2]

This has created a rift among psychologists, with half saying that the problem is more about the way reproducibility tests are run, with the other half feeling “the academic ground give up beneath their feet.” This led one reporter to ask:

“Crisis or not, if we end up with a more rigorous approach to science, and more confidence in what it tells us, surely that is a good thing?”[3]

No, I don’t think that is the answer. In fact, I believe it will make the reproducibility problem worse. The rigorous approach of traditional science is part of the problem. It is time to put a spotlight on how objectivity can interfere with psychology experiments. Otherwise, we are going to continue casting doubt on valid scientific experiments.

Take, for example, an experiment that is literally a textbook case:[4] In the 1980s, Fritz Strack and his co-workers showed that when a person smiles, it improves their mood. Many well-known psychologists, such as William James, and scientists, such as Charles Darwin, have said that expressions create emotions. It makes sense. The challenge was how to design an experiment that scientifically verifies this.

You can’t just ask people to smile, because that automatically makes them conscious of what they’re doing. That will invalidate the results. Strack and his co-workers needed to find a way to get people to move their mouths into a smile, or a pout, without them knowing what they were doing. They found an ingenious solution.

When they asked people to hold a pen in their mouths, with their mouths closed, they automatically moved their faces into a sort of pout. When they asked another group to hold a pen between their teeth without closing their lips, they naturally formed a smile. The subjects had no idea what the test was really about. They were told that the experiment was studying people trying to do two things at the same time. They needed to hold the pen in their mouths while evaluating a series of Far Side cartoons.

Images from an experiment that tested the influence of smiling versus pouting.

The results showed that the group with smiles found the cartoons funnier than the group who was pouting. In other words, just putting your face into a smile naturally brightens your day.

The experiment has been verified countless times over the last twenty-five years, by many researchers. Some have expanded and tested the idea in new ways, besides smiles and pouts, and found similar results. For example, if you take a confident stance, in front of a group, you feel more confident.

So, Strack volunteered to have his classic study be tested by a team of researchers who wanted to reproduce psychology experiments. He wasn’t concerned. It had already been validated before.

Unfortunately, results from the replication experiment contradict Strack’s conclusion. The new test was run by seventeen scientists, across eight countries, using 2,000 subjects. They found no evidence that an unintentional smile or pout made any difference in the funniness of cartoons.[5]

How can this be?

Strack questions the conclusions and the set-up of the experiments. He voiced his concerns even before the testing began, after looking over their approach. At first, as I read Strack’s complaints, it felt like he was trying to defend his original work. But a number of things made me question my first impression.

First, Strack himself offered his experiment to be tested for replication and willingly supplied his original notes and evidence. Second, it had been confirmed successfully many times by other researchers. Third, he questioned the impact of the replication experimenters excluding the results of 600 subjects because they felt those subjects were holding the pens incorrectly or their answers were too wildly divergent. Did their selection to exclude certain results introduce a bias? Fourth, Strack pointed out that many of the subjects were psychology students. Since this was a textbook case, they could have recognized the experiment and its true purpose. That would have prevented them from acting naturally. They should never have been involved.

But it was the fifth point he made that jolted my attention. Strack said that he didn’t like the addition of cameras in the room watching the subjects because it could make the participants self-conscious. That jogged my memory. I had seen this scenario before.

It was one of the most famous early studies in psychology. In 1897, George Stratton strapped on a pair of lenses over his eyes that inverted and reversed his field of view.[6] He knew that our eyes have built-in lenses that produce the same effect: All of the images hitting our retinas are flipped upside-down and reversed. Stratton wanted to see if his mind would naturally find a way to invert and correct his vision.

Sure enough, after five days of looking through inverting lenses, he saw everything as right-side-up. After a week, his new vision felt completely normal.

The results were so startling that hundreds of follow-on experiments were run to reproduce the results. Many did, but some could not. For example, David Linden, a hundred years later, called Stratton’s theory of achieving upright vision a myth.[7] This has created an ongoing controversy.

I studied dozens of experiments with inverting lenses to find an explanation for what was going on. Why were the results so different? I finally found an answer in the longest study ever performed (40 days).[8] Ivan Kohler discovered, unexpectedly, that when he tried to examine the subjects every day with a battery of clinical tests, it interfered with their ability to adapt. They actually regressed.[9]

At first, Kohler thought lab tests would help show the progress his subjects were displaying. Just as Linden did, Kohler brought them in for examination on a daily basis. However, the tests made things worse. The subjects reverted back, losing the gains they had made. What’s going on, he wondered? Kohler had to alter his tests before figuring out the problem. As soon as the experiments were designed to resemble the everyday world, the problem disappeared:

“When the subject was asked to ‘aim’ at something, or to put up his hands in protection when danger threatened…he made correct responses. But when he was asked, ‘Please point this marker in the direction the light is coming from,’ errors occurred.”[10]

That’s when Kohler realized that the subjects were adapting instinctively to the real world. The moment they tried to think critically and objectively about what they were seeing, it broke their “perceptual set.” They reverted back to pre-experimental ways of seeing the world. Asking them to analyze what they were doing prevented them from adapting.

This was hard to understand, Kohler wrote. It took weeks to solve the mystery. For example, after fourteen days of fencing practice, subjects with inverting lenses were able to respond to their opponent’s blade without errors. When it came to fencing, the correct reaction was all that mattered. But if he asked them the question, “Where do you see the rapier point?” it forced them to think critically about what they were experiencing, breaking their lens of perception. They immediately reverted back to old ways of seeing. His question interfered with their instinctive responses.

Getting the subjects to think objectively about what they were doing prevented them from adapting to upright vision. This was the mistake Linden had made. Even though Linden ran his experiment thirty years after Kohler, he didn’t realize the negative impact of objectivity. No wonder all his subjects failed to achieve upright vision.

This is the same affect that cameras can have on subjects. Strack was right: It would make them conscious of being recorded and seeing what they were doing objectively. It makes the experience less natural. On top of this chilling effect of cameras, all of the instructions telling the subjects what to do were presented by a recorded video, in a closed room with no other people, making the experience even more sterile and impersonal.

Can this explain why the subjects showed no positive effects from their unintentional smiles? I think it does. Remember, Strack was trying to study an unconscious effect. He designed his experiments specifically to avoid any interference of conscious thought on the part of the subjects. If moving their mouths into the shape of a smile influences their mood, it is going to happen unconsciously. This means they need to feel at ease and natural, or it isn’t going to work. Thinking critically and objectively about what they were doing is going to interfere.

Think of the irony: Subjecting the subjects of psychology experiments to rigorous, clinical objectivity prevented the very thing they were trying to study—natural responses. They intentionally used cameras and pre-recorded instructions to eliminate outside biases, and without knowing it they introduced a new bias that was just as powerful—objectivity.

Imagine what would happen to a loving relationship if you started analyzing your life partner or lover objectively. Do you think your relationship is going to get better or worse? Is it going to warm up or cool down your natural and playful back-and-forth exchanges?

Psychology research projects have noted the detrimental impact of objectivity on natural relationships. For example, in the last few decades, psychologists have looked closer at the way people learn new skills. John Flach, Professor of Psychology at Wright State University, offers an interesting illustration for how skill-based learning works: Look at the process a child goes through when first learning how to walk, then how to skate on ice, next how to do a handstand, and finally how to walk on stilts.

Each skill needs a “different type of coordination pattern,” a different way of acting to achieve control.[11] In other words, they each require a different lens of perception, a different way of seeing, to master these skills. They learn this unconsciously through trial and error.

Skill-based learning starts with actions. Trying something gives the child feedback, such as falling on their faces or flipping onto their backs. Then they try a new approach. With each loop of trial and error they gradually figure out how to balance and how to move. Learning at this stage is non-verbal and not mediated by thought: The child can’t explain how to balance on stilts. They don’t know how they learned to walk on their hands or skate on ice. They just did it.

This natural learning process is the best way to acquire new skills. No one teaches babies how to talk. They learn it themselves by making sounds and hearing the sounds they make. They learn how to use their bodies the same way: They form working relationships with their muscles and cells. They figure it out without thinking about it.

This is different from academic study, where we consciously think to understand new ideas and what they mean. Our natural process for learning new skills, on the other hand, is largely unconscious and critical thinking can interfere with this natural process.

Psychology experiments are not easy to design. The more rigorous and objective you make them, the more artificial they become, preventing the natural responses you are looking for. You end up learning less about how people act in the real world and more how they behave in a clinical lab.

This is why, as I said above, I believe more objectivity will make the reproducibility crisis worse, not better. What is needed is a better understanding of our lenses of perception, and where to use them. For example, objectivity, as a way of seeing, shouldn’t be the goal of science, but as a tool for double-checking and verifying our experiments. If we want our relationships with others and with our bodies to be natural and spontaneous, we need a relational lens instead, not objectivity.

Over the last century, psychologists have tried to become more rigorous and objective—to become more like physicists. At the same time physicists have come to realize that objectivity can’t explain the behavior of subatomic particles. This is the lesson they learned from quantum mechanics: How you set up an experiment alters the results, and there is nothing you can do to avoid this. In other words, there is no such thing as a fully objective perspective because all measurements influence the outcome.

This same principle applies to the study of natural human responses. It can’t be avoided. Objectivity and critical analysis can and will interfere. If we understand this better, I believe psychology experiments will become easier to reproduce.

I think Katie Palmer got it right when she said that the reproducibility crisis comes down to this:

“The field [of psychology] may have to think differently about how it thinks about itself.”


[1] Open Science Collaboration (over 260 co-authors), “Estimating the Reproducibility of Psychological Science,” Science, August 28, 2015: Vol. 349, Issue 6251.

[2] Daniel T. Gilbert, Gary King, Stephen Pettigrew, Timothy D. Wilson, Comment on ‘Estimating the Reproducibility of Psychological Science,’” Science, March 4, 2016: Vol. 351, Issue 6277.

[3] Ed Young, “Psychology’s Replication Crisis Can’t Be Wished Away,” The Atlantic, March 4, 2016.

[4] Fritz Strack, Leonard L. Martin, Sabine Stepper, “Inhibiting and Facilitating Conditions of the Human Smile: A Nonobtrusive Test of the Facial Feedback Hypothesis,” Journal of Personality and Social Psychology, Vol 54(5), May 1988, 768-777.

[5] Daniel Engber, “Sad Face,” Slate magazine,  August 28, 2016.

[6] George M. Stratton, “Vision without Inversion of the Retinal Image,” Psychological Review 4, no. 4 (1897), p. 341-360.

[7] David E. J. Linden, Ulrich Kallenbach, Armin Heinecke, Wolf Singer, Rainer Goebel, “The Myth of Upright Vision,” Perception 28, no. 4 (1999), p. 469-481. Also posted at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.294.9093&rep=rep1&type=pdf.

[8] Ivo Kohler, The Formation and Transformation of the Perceptual World, tr. Harry Fiss (New York: International Universities Press, 1964).

[9] Doug Marman, “Lenses of Perception: A Surprising New Look at the Origin of Life, the Laws of Nature, and Our Universe,” (Ridgefield, Washington, Lenses of Perception Press, 2016.), p. 88-90.

[10] Ivo Kohler, The Formation and Transformation of the Perceptual World, p. 153-155.

[11] John M. Flach and Fred Voorhorst, “What Matters?: Putting Common Sense to Work,” (Dayton, Ohio, Wright State University Libraries, 2016), p. 104-105.

Share with:

FacebookTwitterGoogleStumbleUponRedditDigg


6 thoughts on “The Reproducibility Crisis of Psychology and What It Is Trying to Tell Us

  1. Pingback: Reproducibility Crisis in Psychology | Perspicacity

  2. Doug, I found this very interesting. All psychologists are trained from the very beginning to be aware of the sensitivity of performance to very subtle “biases” introduced by the experimenter and by the context (demand characteristics). The majority stance to this is to increasingly “objectify” the experimental context to eliminate bias and confounding variables. In other words, the response is to eliminate all “subjectivity” in the pursuit of experimental control. Thus, your example of using cameras and video — reducing contact between the participant and the experimenter. However, there is an increasing minority who feel that in the pursuit of “control” experimental psychology is essentially “killing” or taking the life out of the phenomenon of interest. This minority tends to try to bring more of the natural context into the experimental situation – to enhance ecological validity.
    I agree with your assessment. In response to the replication crisis, psychology as a field is tending toward increasing objectification and trivialization of human experience. There is a creeping ‘scientism’ that is likely to be very counterproductive with respect to our goals to gain a deeper understanding of human experience. In pursuit of better reproducibility it is likely that the field will end up objectifying all human experience out of psychology! I am afraid that increased interest in neuroscience is a symptom of this pursuit of objectification of human experience. John Flach

    • Thanks for your comment, John. The concern that I hear in your response is similar to what Leonard Martin, a psychologist who worked with Strack on the original experiment, said to a reporter that I referring to in my article:

      Leonard Martin agrees with Strack’s concerns and says the replicators didn’t fully follow their procedure. The work was so sloppy, he argued via email, that “the real story here may not be about the replicability of the pen in the mouth study or the replicability of psychology research in general but about the current method of assessing replicability.” Given that such efforts can alter established findings in the field and tarnish people’s reputations, he said that “Project Replication” should be very careful: “If the current lack of rigor continues, then psychology may find itself in its own version of the McCarthy era.”

      I think Katie Palmer, who I quoted in my article, captured the meaning of this issue well when she wrote this:

      What this is really about is how psychology sees itself—and how that vision could affect what scientists think of of the Reproducibility Project, positive or negative. “There is a community of researchers who think that there is just no problem whatsoever and a community of researchers who believe that the field is seriously in crisis,” says Jonathan Schooler, a psychologist at UC Santa Barbara. “There is some antagonism between those two communities, and both sides each have a perspective that may color the way they’re seeing things.”

      Nosek[who led the effort to run the replication project that tested 100 experiments] feels it, too. “You think it’s slightly antagonistic?” he says.

      At its heart, both sides were driven to write these papers because they frickin’ love psychology. “What I want to observe is high reproducibility,” says Nosek. “That is better for us, the findings, and the field.” But that love is also what drove him to found the Center for Open Science—he saw things going wrong in his field and wanted to help fix them. Noble, but it may have driven the design and interpretation of the 100 replications in a way that would underestimate replication rates.

      What surprises me from all of this is that there isn’t a greater awareness of the impact of objectivity and critical analysis on natural behavior. This is something we experience in our daily lives. The old children’s story, about the millipede that couldn’t walk anymore after it tried to think about the order in which it moved its feet, shows that even children can understand this.

      Thanks.

      Doug.

  3. Lens blindness can be very debilitating, and this particular variety – 3rd person lens blindness, sometimes called “scientism” – is one of the worst. When you’ve gotten responses from psychologists, and when you can find the time, I hope you’ll publish some of them, pointing out their “objectivity” bias. Unfortunately don’t expect them to agree – it will take years, decades, to overcome this problem. Perhaps someday I’ll get around to reading the replication report myself, but for now I’m rather confident your conclusions are valid. Clearly you’ve done your homework!

  4. Doug, your analysis is very convincing. But it’s hard to believe psychologists don’t already know that our natural reactions may be altered when we’re on camera, or examined every day with a battery of clinical tests. If the solution to the reproducibility crisis is so easy, what does that say about the ability of these psychologists who apparently have been baffled by it? Are you sure their 3rd-person lens blindness is so strong? I hope one of them responds to your post, addressing this issue.

    • George, if you read the replication report, you can see that there is one thing that is at the top of their list, in importance: To make the experiment as objective as possible. They also want to reproduce the original experiment as faithfully as possible, but the most important thing is not letting personal biases creep in to influence the results. This is why they put all the explanations for the subjects on a video and didn’t have anyone else in the room with them, when running their test. They included a camera to watch and make sure they were following the rules. This was their focus.

      Strack and most psychologists are quite aware of the subtle ways that people can be affected. That’s why Strack pointed out the problem with the cameras. He was focused on trying to get natural responses. Those running reproducibility tests, however, are focused on making the experiments objective and impartial. This is the same tactic that David Linden took in his inverting lens experiments. He was so focused on objective analysis that he didn’t realize he was interfering with the natural process of learning a new skill.

      Here is an example of just how strong this 3rd-person blindness can be: When Strack first ran his test, he was impressed by the positive results, but he didn’t feel that it was convincing enough to publish. So he ran a second test, to reproduce his original experiment. He made one change with the second test: He now gave the subjects two questions instead of one. They had to rank how funny the cartoons were and they needed to rate how amused they felt with the cartoons. He asked the two questions this way for a specific reason: He thought that the first question would be taken as more of an objective ranking, while the second question would be seen as more about their personal enjoyment. Sure enough, the first question showed no correlation to whether they were smiling or not, but the second question produced even more positive results than the first experiment. This was the validation he was looking for, before publishing the results.

      However, when a reporter, who questioned the validity of Strack’s experiment, looked at what had happened, he felt that Strack had just invalidated his first experiment, not further validated it. Why? Because he ignored the effect created by objectivity versus natural response. He was going on the assumption that science is all about objectivity. The first question should have shown a positive correlation, but it didn’t. He viewed this as a failure. This goes to show you how subtle the problem is and how often it is overlooked. It is a case of 3rd-person blindness, as you say.

      I’ve already heard some positive feedback from this paper, but I’m going to forward it on to others, in hopes that others might respond as well.

      Thanks.

      Doug.

Leave a Reply

Your email address will not be published. Required fields are marked *