The Reproducibility Crisis of Psychology and What It Is Trying to Tell Us

By Doug Marman

Over the last few years, a raging crisis has hit the field of psychology: Most published studies can’t be replicated by others. For example, 100 experiments published by highly respected psychology journals were recently tested and only 36% produced results in agreement with the original reports.[1] This is called the “reproducibility crisis.”

It’s a complicated problem. It isn’t caused by fraud, except in rare cases. Many factors are involved, as explained by this article. For example, designing psychology experiments is more difficult than it sounds, and drawing conclusions often involves complex statistical analysis. Even the experiments aimed at reproducing experiments have been found wanting.[2]

This has created a rift among psychologists, with half saying that the problem is more about the way reproducibility tests are run, with the other half feeling “the academic ground give up beneath their feet.” This led one reporter to ask:

“Crisis or not, if we end up with a more rigorous approach to science, and more confidence in what it tells us, surely that is a good thing?”[3]

No, I don’t think that is the answer. In fact, I believe it will make the reproducibility problem worse. The rigorous approach of traditional science is part of the problem. It is time to put a spotlight on how objectivity can interfere with psychology experiments. Otherwise, we are going to continue casting doubt on valid scientific experiments.

Take, for example, an experiment that is literally a textbook case:[4] In the 1980s, Fritz Strack and his co-workers showed that when a person smiles, it improves their mood. Many well-known psychologists, such as William James, and scientists, such as Charles Darwin, have said that expressions create emotions. It makes sense. The challenge was how to design an experiment that scientifically verifies this.

You can’t just ask people to smile, because that automatically makes them conscious of what they’re doing. That will invalidate the results. Strack and his co-workers needed to find a way to get people to move their mouths into a smile, or a pout, without them knowing what they were doing. They found an ingenious solution.

When they asked people to hold a pen in their mouths, with their mouths closed, they automatically moved their faces into a sort of pout. When they asked another group to hold a pen between their teeth without closing their lips, they naturally formed a smile. The subjects had no idea what the test was really about. They were told that the experiment was studying people trying to do two things at the same time. They needed to hold the pen in their mouths while evaluating a series of Far Side cartoons.

Images from an experiment that tested the influence of smiling versus pouting.

The results showed that the group with smiles found the cartoons funnier than the group who was pouting. In other words, just putting your face into a smile naturally brightens your day.

The experiment has been verified countless times over the last twenty-five years, by many researchers. Some have expanded and tested the idea in new ways, besides smiles and pouts, and found similar results. For example, if you take a confident stance, in front of a group, you feel more confident.

So, Strack volunteered to have his classic study be tested by a team of researchers who wanted to reproduce psychology experiments. He wasn’t concerned. It had already been validated before.

Unfortunately, results from the replication experiment contradict Strack’s conclusion. The new test was run by seventeen scientists, across eight countries, using 2,000 subjects. They found no evidence that an unintentional smile or pout made any difference in the funniness of cartoons.[5]

How can this be?

Strack questions the conclusions and the set-up of the experiments. He voiced his concerns even before the testing began, after looking over their approach. At first, as I read Strack’s complaints, it felt like he was trying to defend his original work. But a number of things made me question my first impression.

First, Strack himself offered his experiment to be tested for replication and willingly supplied his original notes and evidence. Second, it had been confirmed successfully many times by other researchers. Third, he questioned the impact of the replication experimenters excluding the results of 600 subjects because they felt those subjects were holding the pens incorrectly or their answers were too wildly divergent. Did their selection to exclude certain results introduce a bias? Fourth, Strack pointed out that many of the subjects were psychology students. Since this was a textbook case, they could have recognized the experiment and its true purpose. That would have prevented them from acting naturally. They should never have been involved.

But it was the fifth point he made that jolted my attention. Strack said that he didn’t like the addition of cameras in the room watching the subjects because it could make the participants self-conscious. That jogged my memory. I had seen this scenario before.

It was one of the most famous early studies in psychology. In 1897, George Stratton strapped on a pair of lenses over his eyes that inverted and reversed his field of view.[6] He knew that our eyes have built-in lenses that produce the same effect: All of the images hitting our retinas are flipped upside-down and reversed. Stratton wanted to see if his mind would naturally find a way to invert and correct his vision.

Sure enough, after five days of looking through inverting lenses, he saw everything as right-side-up. After a week, his new vision felt completely normal.

The results were so startling that hundreds of follow-on experiments were run to reproduce the results. Many did, but some could not. For example, David Linden, a hundred years later, called Stratton’s theory of achieving upright vision a myth.[7] This has created an ongoing controversy.

I studied dozens of experiments with inverting lenses to find an explanation for what was going on. Why were the results so different? I finally found an answer in the longest study ever performed (40 days).[8] Ivan Kohler discovered, unexpectedly, that when he tried to examine the subjects every day with a battery of clinical tests, it interfered with their ability to adapt. They actually regressed.[9]

At first, Kohler thought lab tests would help show the progress his subjects were displaying. Just as Linden did, Kohler brought them in for examination on a daily basis. However, the tests made things worse. The subjects reverted back, losing the gains they had made. What’s going on, he wondered? Kohler had to alter his tests before figuring out the problem. As soon as the experiments were designed to resemble the everyday world, the problem disappeared:

“When the subject was asked to ‘aim’ at something, or to put up his hands in protection when danger threatened…he made correct responses. But when he was asked, ‘Please point this marker in the direction the light is coming from,’ errors occurred.”[10]

That’s when Kohler realized that the subjects were adapting instinctively to the real world. The moment they tried to think critically and objectively about what they were seeing, it broke their “perceptual set.” They reverted back to pre-experimental ways of seeing the world. Asking them to analyze what they were doing prevented them from adapting.

This was hard to understand, Kohler wrote. It took weeks to solve the mystery. For example, after fourteen days of fencing practice, subjects with inverting lenses were able to respond to their opponent’s blade without errors. When it came to fencing, the correct reaction was all that mattered. But if he asked them the question, “Where do you see the rapier point?” it forced them to think critically about what they were experiencing, breaking their lens of perception. They immediately reverted back to old ways of seeing. His question interfered with their instinctive responses.

Getting the subjects to think objectively about what they were doing prevented them from adapting to upright vision. This was the mistake Linden had made. Even though Linden ran his experiment thirty years after Kohler, he didn’t realize the negative impact of objectivity. No wonder all his subjects failed to achieve upright vision.

This is the same affect that cameras can have on subjects. Strack was right: It would make them conscious of being recorded and seeing what they were doing objectively. It makes the experience less natural. On top of this chilling effect of cameras, all of the instructions telling the subjects what to do were presented by a recorded video, in a closed room with no other people, making the experience even more sterile and impersonal.

Can this explain why the subjects showed no positive effects from their unintentional smiles? I think it does. Remember, Strack was trying to study an unconscious effect. He designed his experiments specifically to avoid any interference of conscious thought on the part of the subjects. If moving their mouths into the shape of a smile influences their mood, it is going to happen unconsciously. This means they need to feel at ease and natural, or it isn’t going to work. Thinking critically and objectively about what they were doing is going to interfere.

Think of the irony: Subjecting the subjects of psychology experiments to rigorous, clinical objectivity prevented the very thing they were trying to study—natural responses. They intentionally used cameras and pre-recorded instructions to eliminate outside biases, and without knowing it they introduced a new bias that was just as powerful—objectivity.

Imagine what would happen to a loving relationship if you started analyzing your life partner or lover objectively. Do you think your relationship is going to get better or worse? Is it going to warm up or cool down your natural and playful back-and-forth exchanges?

Psychology research projects have noted the detrimental impact of objectivity on natural relationships. For example, in the last few decades, psychologists have looked closer at the way people learn new skills. John Flach, Professor of Psychology at Wright State University, offers an interesting illustration for how skill-based learning works: Look at the process a child goes through when first learning how to walk, then how to skate on ice, next how to do a handstand, and finally how to walk on stilts.

Each skill needs a “different type of coordination pattern,” a different way of acting to achieve control.[11] In other words, they each require a different lens of perception, a different way of seeing, to master these skills. They learn this unconsciously through trial and error.

Skill-based learning starts with actions. Trying something gives the child feedback, such as falling on their faces or flipping onto their backs. Then they try a new approach. With each loop of trial and error they gradually figure out how to balance and how to move. Learning at this stage is non-verbal and not mediated by thought: The child can’t explain how to balance on stilts. They don’t know how they learned to walk on their hands or skate on ice. They just did it.

This natural learning process is the best way to acquire new skills. No one teaches babies how to talk. They learn it themselves by making sounds and hearing the sounds they make. They learn how to use their bodies the same way: They form working relationships with their muscles and cells. They figure it out without thinking about it.

This is different from academic study, where we consciously think to understand new ideas and what they mean. Our natural process for learning new skills, on the other hand, is largely unconscious and critical thinking can interfere with this natural process.

Psychology experiments are not easy to design. The more rigorous and objective you make them, the more artificial they become, preventing the natural responses you are looking for. You end up learning less about how people act in the real world and more how they behave in a clinical lab.

This is why, as I said above, I believe more objectivity will make the reproducibility crisis worse, not better. What is needed is a better understanding of our lenses of perception, and where to use them. For example, objectivity, as a way of seeing, shouldn’t be the goal of science, but as a tool for double-checking and verifying our experiments. If we want our relationships with others and with our bodies to be natural and spontaneous, we need a relational lens instead, not objectivity.

Over the last century, psychologists have tried to become more rigorous and objective—to become more like physicists. At the same time physicists have come to realize that objectivity can’t explain the behavior of subatomic particles. This is the lesson they learned from quantum mechanics: How you set up an experiment alters the results, and there is nothing you can do to avoid this. In other words, there is no such thing as a fully objective perspective because all measurements influence the outcome.

This same principle applies to the study of natural human responses. It can’t be avoided. Objectivity and critical analysis can and will interfere. If we understand this better, I believe psychology experiments will become easier to reproduce.

I think Katie Palmer got it right when she said that the reproducibility crisis comes down to this:

“The field [of psychology] may have to think differently about how it thinks about itself.”


[1] Open Science Collaboration (over 260 co-authors), “Estimating the Reproducibility of Psychological Science,” Science, August 28, 2015: Vol. 349, Issue 6251.

[2] Daniel T. Gilbert, Gary King, Stephen Pettigrew, Timothy D. Wilson, Comment on ‘Estimating the Reproducibility of Psychological Science,’” Science, March 4, 2016: Vol. 351, Issue 6277.

[3] Ed Young, “Psychology’s Replication Crisis Can’t Be Wished Away,” The Atlantic, March 4, 2016.

[4] Fritz Strack, Leonard L. Martin, Sabine Stepper, “Inhibiting and Facilitating Conditions of the Human Smile: A Nonobtrusive Test of the Facial Feedback Hypothesis,” Journal of Personality and Social Psychology, Vol 54(5), May 1988, 768-777.

[5] Daniel Engber, “Sad Face,” Slate magazine,  August 28, 2016.

[6] George M. Stratton, “Vision without Inversion of the Retinal Image,” Psychological Review 4, no. 4 (1897), p. 341-360.

[7] David E. J. Linden, Ulrich Kallenbach, Armin Heinecke, Wolf Singer, Rainer Goebel, “The Myth of Upright Vision,” Perception 28, no. 4 (1999), p. 469-481. Also posted at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.294.9093&rep=rep1&type=pdf.

[8] Ivo Kohler, The Formation and Transformation of the Perceptual World, tr. Harry Fiss (New York: International Universities Press, 1964).

[9] Doug Marman, “Lenses of Perception: A Surprising New Look at the Origin of Life, the Laws of Nature, and Our Universe,” (Ridgefield, Washington, Lenses of Perception Press, 2016.), p. 88-90.

[10] Ivo Kohler, The Formation and Transformation of the Perceptual World, p. 153-155.

[11] John M. Flach and Fred Voorhorst, “What Matters?: Putting Common Sense to Work,” (Dayton, Ohio, Wright State University Libraries, 2016), p. 104-105.