Wednesday, August 6, 2014

It just makes it, for the time being, less credible.


You can find the most interesting things in the strangest of places. Slate is not one of my routine knowledge destinations: too biased, trivial, first world problemy, prejudiced, and otherwise profoundly unserious. But you just never know.

Why Psychologists’ Food Fight Matters “Important findings” haven’t been replicated, and science may have to change its ways. by Michelle N. Meyer and Christopher Chabris is an excellent summary of the state-of-play regarding the issue of replication of scientific studies. I endorse practically all of Meyer and Chabris's observations and recommendations.

Their case study is the rampant nonreplicability of dramatic findings in the fields of social sciences but they properly observe that the issue is prevalent in virtually all fields, just to a marginally lesser degree.
The recent special issue of Social Psychology was an unprecedented collective effort by social psychologists to do just that—by altering researchers’ and journal editors’ incentives in order to check the robustness of some of the most talked-about findings in their own field. Any researcher who wanted to conduct a replication was invited to preregister: Before collecting any data from subjects, they would submit a proposal detailing precisely how they would repeat the original study and how they would analyze the data. Proposals would be reviewed by other researchers, including the authors of the original studies, and once approved, the study’s results would be published no matter what. Preregistration of the study and analysis procedures should deter p-hacking, guaranteed publication should counteract the file drawer effect, and a requirement of large sample sizes should make it easier to detect small but statistically meaningful effects.

The results were sobering. At least 10 of the 27 “important findings” in social psychology were not replicated at all. In the social priming area, only one of seven replications succeeded.

[snip]

The incivility and personal attacks surrounding both this latest replication attempt (and prior attempts) may draw the attention of researchers away from where it belongs: on producing the robust science that everyone needs and deserves. Of course, researchers are human beings, not laboratory-dwelling robots, so it’s entirely understandable that some will be disappointed or even feel persecuted when others fail to replicate their research. For that matter, it’s understandable that some replicators will take pride and satisfaction in contributing to the literature by challenging the robustness of a celebrated finding.

But worry over these natural emotional responses should not lead us to rewrite the rules of science. To publish a scientific result is to make a claim about reality. Reality doesn’t belong to researchers, much less to any single researcher, and claims about it need to be verified. Critiques or attempts to replicate scientific claims should always be—and usually are—about reality, not about the researchers who made the claim. In science, as in The Godfather: It’s not personal, it’s business.
Meyer and Chabris observe.
A final salutary change is an overdue shift of emphasis among psychologists toward establishing the size of effects, as opposed to disputing whether or not they exist. The very notion of “failure” and “success” in empirical research is urgently in need of refinement. When applied thoughtfully, this dichotomy can be useful shorthand (and we’ve used it here). But there are degrees of replication between success and failure, and these degrees matter.

For example, suppose an initial study of an experimental drug for cardiovascular disease suggests that it reduces the risk of heart attack by 50 percent compared to a placebo pill. The most meaningful question for follow-up studies is not the binary one of whether the drug’s effect is 50 percent or not (did the first study replicate?), but the continuous one of precisely how much the drug reduces heart attack risk. In larger subsequent studies, this number will almost inevitably drop below 50 percent, but if it remains above 0 percent for study after study, then the best message should be that the drug is in fact effective, not that the initial results “failed to replicate.”
I think they are right to call out effect size as an unaddressed issue but I would place even more emphasis on it.

Say we have a new treatment that reduces the risk of mortal Condition X. That is on its own, not enough. We also need to know the effect size, as M&C point out. New treatment reduces risk of X by 30% is important to know. But it doesn't stop there. What is the original probability of Condition X. If we have a 50% chance of succumbing to Condition X, then a 30% reduction in that risk is material and might be worthwhile.

On the other hand, if Condition X is a real but rare condition then perhaps not. If there is only a 0.05% chance of developing Condition X, then perhaps a 30% reduction in that chance doesn't make much sense.

I alluded to the credulity of policy advocates in Does religion make you impressionably gullible or rigorously skeptical? and pointed out the elements you look for in a rigorous study which are usually notably absent in the social sciences fields (randomization of participant selection, double blind, null hypothesis testing, large population size, longitudinal design, etc.). As M&C note, that doesn't make social sciences any less important than it is or could be, it just makes it, for the time being, less credible.

It is interesting to juxtapose this call for intellectual rigor with an example of magical thinking that I came across this morning.

Roslyn Chavda is an African American professor in Political Science. She was hired some years ago by the University of New Hampshire on an affirmative action basis to increase professorial diversity. After six years her annual contract was not renewed and she was taken off the tenure track. The basis for the decision was low research productivity and poor student reviews of her teaching.

She claimed that "At no point did they attempt . . . to . . . help me with teaching, help me with publishing, take me under their wing." She did not dispute the evidence of poor performance but sought to justify that poor performance with the claim that that poor performance was the result of others not intervening to improve her performance. She claimed racial discrimination, gender discrimination, and status discrimination - claims which were summarily dismissed by the court as lacking any evidence. In her deposition, she affirms a couple of different times that "I have no evidence for this. I think my race set me apart from them, not from my perspective but from theirs."

The court's summary is:
Chavda has produced no evidence of any racial animus on the part of any of her colleagues in the political
science department. She has produced evidence that her colleagues knew that the only reason the department was able to hire her was her race. But, she has not produced any evidence that any member of the department was displeased by the circumstances of Chavda’s hiring or harbored any animosity toward African Americans specifically or people of color generally. Although she refers to “venom” hurled by her colleagues, the only venom of which she provides any evidence consists of comments about her deficiencies in teaching, scholarship, and interactions with colleagues in the department.

The court dismisses the entire suit.

The upshot of this is that a person is hired in a racially discriminatory fashion (affirmative action) and is later fired for poor performance and makes a claim of racial discrimination but with zero evidence other than her own speculation that they must have been biased against her in order for them to fire her.

The link to the M&C report is this leitmotif of feelings and emotions as determinants of reality rather than replicable empirical information.

When independent researchers failed to replicate the critical research she had produced, Prof. Simone Schnall responded by telling Science (the magazine) that
the entire process made her feel “like a criminal suspect who has no right to a defense and there is no way to win.” The Science article covering the special issue was titled “Replication Effort Provokes Praise—and ‘Bullying’ Charges.” Both there and in her blog post, Schnall said that her work had been “defamed,” endangering both her reputation and her ability to win grants. She feared that by the time her formal response was published, the conversation might have moved on, and her comments would get little attention.
The focus here is on the negative consequences to her and how she feels about the process, rather than whether her experiment was accurate or not.

Similarly, Roslyn Chavda is focused not on her actual performance but on how she feels and how the outcome will inconvenience her and not on whether there was actual discrimination.

We are all fallible and invariably we all make both unintentional mistakes as well as occasionally really boneheaded decisions. What is disturbing is the departure from reality displayed here where the focus is not on what is true or false but is on feelings and emotions and imagined conditions. Some of the biggest traitors to the age of reason are safe harbored in the very institutions that ought to be cultivating a passion and thirst for truth.

No comments:

Post a Comment