Statistical Significance

I have just been arguing with myself over an Op-Ed in the New York Times yesterday, which partly described a controversy surrounding a recent trial of an AIDS vaccine in Thailand. The controversy centers on whether the results of the study were statistically significant — that is, whether the authors truly found that the vaccine prevented infections, or whether the apparently-favorable result was a product of chance. Apparently, some outside observers felt that, when the research team first announced their results, they had cherry-picked the statistical test that cast their results in the most favorable light.

At first, I was inclined to support the author of the Op-Ed, Seth Berkley, in defending the study. I think that it goes without saying that finding an AIDS vaccine is worth a significant monetary investment, because it would save many lives.

However, one sentence of the Op-Ed left me taken aback:

This illustrates why the controversy over statistical significance is exaggerated. Whether you consider the first or second analysis, the observed effect of the Thai candidates was either just above or below the level of statistical significance. Statisticians will tell you it is possible to observe an effect and have reason to think it’s real even if it’s not statistically significant. And if you think it’s real, you ought to examine it carefully.

I read that to my wife, and she put words to my own thoughts, saying, “Yes, there is a word for thinking that something is real when it is not statistically significant: bias.”

Statistics is useful because it allows us to quantify how certain we are that something is real, independent of our desire for it to be real. When the result of an experiment falls near the boundary of statistical significance or insignificance, all that one can infer is that one doesn’t have enough information to securely confirm or refute the original hypothesis. If the hypothesis is about something critical, then one should get more data.

The attitude expressed in the above paragraph concerns me, because when faced with a result that lies on a statistical margin, there is a tendency for some researchers to try a number of statistical tests, and only choose to report the test in which the result is significant. The problem with trying a number of statistical tests is that it increases the chance that the randomness of nature will produce a spurious positive result. The order in which the data from the Thailand AIDS vaccine was released apparently raised exactly this concern among some scientists.

So, I took a deep breath, and checked the numbers. The initial press release claimed that there was only a 4% probability that the reduced infection rate in the vaccinated group was a product of chance, whereas the later results gave chances of 8% and 16% that the result was a product of chance.

Apparently, according to the Wall Street Journal, biologists consider a <5% probability that a result could originate in chance to be the key level of significance for judging that a result is probably real. This seems reasonable. For that matter, the <16% chance probability that the result is spurious sounds pretty good to me, especially given that AIDS is a life-or-death situation, and anything that would help save those lives is important to pursue.

I have used a range of probability cutoffs in my studies. I have reported signals that have had a 10% probability of being caused by chance when the existence of the signal was mundane. For example, I called any periodic signal with a <10% chance probability a "detection" when I was writing my thesis on X-ray bursts from neutron stars, because it was already well-established that the phenomena occurred, and I was simply trying to build a sample. However, for a surprising result, my colleagues and I would demand a higher statistical significance. When I discovered a neutron star in a Chandra observation of the young star cluster Westerlund 1, I had to show that there was a <0.1% chance that the neutron star was there by accident before the referees would agree that the neutron star was actually in Westerlund 1 (and even then, I had to explain the statistics to the referees twice).

However, that is enough about me. The point is, there is no "rule" as to what statistical significance level is reliable. One could be fooled by a one-in-a-million result if one is unlucky. Rather, it is a matter of what will convince one's audience given the importance of the result (OK, I suppose that is the bias that upset me earlier), and whether the presentation of the result is faithful to any lingering uncertainties.

Unfortunately, the AIDS result will continue to be controversial, because it is marginal. To quote the Wall Street Journal article,

Observers noted that the result was derived from a small number of actual HIV cases. New infections occurred in 51 of the 8,197 people who got the vaccine, compared with 74 of the 8,198 volunteers who got placebo shots.

These numbers dampen my initial enthusiasm, especially given the spectacular successes of past vaccines. I was recently reading about Louis Pasteur’s vaccine work in a collection of biographies of 19th century scientists (The Golden Age of Science, edited by Bessie Zaban Jones) . Pasteur developed a number of very successful vaccines, including one for anthrax that reduced the mortality rate of oxen and sheep from the disease from 10% to 1%, and one for rabies that decreased the mortality rate in humans from over 15% to about 1%. These are huge, statistically significant improvements. The initial samples that established the efficacy of these vaccines didn’t need to be large. The first anthrax study was of 50 sheep, half of which were vaccinated, and nearly all of which survived exposure to anthrax. It is somewhat disheartening that what gets hyped as progress in medicine now can be so slight in comparison.

It’s a shame that the results came out the way that they did, and I don’t think that Seth Berkley’s Op-Ed will help. Both trigger the destructive instincts of people (like me) who want to see justice meted out to scientists who hype up their work by playing with statistics.

However, one also doesn’t want to reject a promising vaccine just because one resents feeling played. Looking into this, two numbers jumped out at me. First, the most conservative estimate is still that there is a >84% chance that this vaccine works. That isn’t bad. Second, even if the AIDS vaccine “only” reduces infection rates by 30%, that could prevent up to 1,000,000 infections a year. That would be huge.