Statistics

You are currently browsing the archive for the Statistics category.

I have just been arguing with myself over an Op-Ed in the New York Times yesterday, which partly described a controversy surrounding a recent trial of an AIDS vaccine in Thailand. The controversy centers on whether the results of the study were statistically significant — that is, whether the authors truly found that the vaccine prevented infections, or whether the apparently-favorable result was a product of chance. Apparently, some outside observers felt that, when the research team first announced their results, they had cherry-picked the statistical test that cast their results in the most favorable light.

At first, I was inclined to support the author of the Op-Ed, Seth Berkley, in defending the study. I think that it goes without saying that finding an AIDS vaccine is worth a significant monetary investment, because it would save many lives.

However, one sentence of the Op-Ed left me taken aback:

This illustrates why the controversy over statistical significance is exaggerated. Whether you consider the first or second analysis, the observed effect of the Thai candidates was either just above or below the level of statistical significance. Statisticians will tell you it is possible to observe an effect and have reason to think it’s real even if it’s not statistically significant. And if you think it’s real, you ought to examine it carefully.

I read that to my wife, and she put words to my own thoughts, saying, “Yes, there is a word for thinking that something is real when it is not statistically significant: bias.”

Statistics is useful because it allows us to quantify how certain we are that something is real, independent of our desire for it to be real. When the result of an experiment falls near the boundary of statistical significance or insignificance, all that one can infer is that one doesn’t have enough information to securely confirm or refute the original hypothesis. If the hypothesis is about something critical, then one should get more data.

The attitude expressed in the above paragraph concerns me, because when faced with a result that lies on a statistical margin, there is a tendency for some researchers to try a number of statistical tests, and only choose to report the test in which the result is significant. The problem with trying a number of statistical tests is that it increases the chance that the randomness of nature will produce a spurious positive result. The order in which the data from the Thailand AIDS vaccine was released apparently raised exactly this concern among some scientists.

So, I took a deep breath, and checked the numbers. The initial press release claimed that there was only a 4% probability that the reduced infection rate in the vaccinated group was a product of chance, whereas the later results gave chances of 8% and 16% that the result was a product of chance.

Apparently, according to the Wall Street Journal, biologists consider a <5% probability that a result could originate in chance to be the key level of significance for judging that a result is probably real. This seems reasonable. For that matter, the <16% chance probability that the result is spurious sounds pretty good to me, especially given that AIDS is a life-or-death situation, and anything that would help save those lives is important to pursue.

I have used a range of probability cutoffs in my studies. I have reported signals that have had a 10% probability of being caused by chance when the existence of the signal was mundane. For example, I called any periodic signal with a <10% chance probability a "detection" when I was writing my thesis on X-ray bursts from neutron stars, because it was already well-established that the phenomena occurred, and I was simply trying to build a sample. However, for a surprising result, my colleagues and I would demand a higher statistical significance. When I discovered a neutron star in a Chandra observation of the young star cluster Westerlund 1, I had to show that there was a <0.1% chance that the neutron star was there by accident before the referees would agree that the neutron star was actually in Westerlund 1 (and even then, I had to explain the statistics to the referees twice).

However, that is enough about me. The point is, there is no "rule" as to what statistical significance level is reliable. One could be fooled by a one-in-a-million result if one is unlucky. Rather, it is a matter of what will convince one's audience given the importance of the result (OK, I suppose that is the bias that upset me earlier), and whether the presentation of the result is faithful to any lingering uncertainties.

Unfortunately, the AIDS result will continue to be controversial, because it is marginal. To quote the Wall Street Journal article,

Observers noted that the result was derived from a small number of actual HIV cases. New infections occurred in 51 of the 8,197 people who got the vaccine, compared with 74 of the 8,198 volunteers who got placebo shots.

These numbers dampen my initial enthusiasm, especially given the spectacular successes of past vaccines. I was recently reading about Louis Pasteur’s vaccine work in a collection of biographies of 19th century scientists (The Golden Age of Science, edited by Bessie Zaban Jones) . Pasteur developed a number of very successful vaccines, including one for anthrax that reduced the mortality rate of oxen and sheep from the disease from 10% to 1%, and one for rabies that decreased the mortality rate in humans from over 15% to about 1%. These are huge, statistically significant improvements. The initial samples that established the efficacy of these vaccines didn’t need to be large. The first anthrax study was of 50 sheep, half of which were vaccinated, and nearly all of which survived exposure to anthrax. It is somewhat disheartening that what gets hyped as progress in medicine now can be so slight in comparison.

It’s a shame that the results came out the way that they did, and I don’t think that Seth Berkley’s Op-Ed will help. Both trigger the destructive instincts of people (like me) who want to see justice meted out to scientists who hype up their work by playing with statistics.

However, one also doesn’t want to reject a promising vaccine just because one resents feeling played. Looking into this, two numbers jumped out at me. First, the most conservative estimate is still that there is a >84% chance that this vaccine works. That isn’t bad. Second, even if the AIDS vaccine “only” reduces infection rates by 30%, that could prevent up to 1,000,000 infections a year. That would be huge.

The recent Supreme Court decision on Ricci vs. DeStefano has sparked a lot of talk about affirmative action and “reverse discrimination.” I tend to agree with suggestions that affirmative action should be based on things other than race, such as income level, educational opportunities available where one grew up, and whether ones parents went to college.

However, I have been bothered by all the emphasis on “reverse discrimination.” My hunch has been that this doesn’t do justice to the magnitude of the discrimination that some minorities face. So, I tried to put some numbers on discrimination. The place I knew I could start was affirmative action in college admissions.

I attempted to make a crude estimate of how race-based admissions might decrease the chances that a member of a well-represented majority would be accepted to an elite university. To do this, I needed to compare the relative admission rates of those who are members of under-represented minorities, compared to those for the general population. Say that a fraction f of students are admitted to a school. That means that of M students that apply, N=fM are admitted. Now say that I know the fraction of those students that are under-represented minorities, g. To estimate the effect of racial preferences, I will assume that under-represented minority students were admitted at some factor x times the rate of non-minority students. I happen to have been able to find estimates for these numbers relatively easy, which is why I took this approach.

I then wanted to estimate the fraction of non-minorities admitted as a function of x, the effect of racial preferences. The number of non-minorities admitted is n=(1-g)N. The number of non-minority applicants is m = (1-g/x)M. The admissions rate for non-minorities is then f’ = n/m = (1-g)/(1-g/x)N/M = (1-g)/(1-g/x) f. By changing x, I can tell how the fraction of admitted non-minorities would be affected by affirmative action.

So, I found some admissions numbers for Harvard in the Boston Globe. In 2009, Harvard admitted f=7% of applicants. Other Ivy League schools admitted about 10%, so this seems like a good number to work with. Of the students accepted to Harvard, g=22% were from under-represented minorities. This seems to hold true generally at the Ivy League and University of California Schools, so I think this is also a good number to work with.

I was not able to find numbers for x for Harvard, but I needed to make some assumptions to construct any sort of argument. So, I tried to find numbers from other elite schools. From an article in UCLA’s student newspaper, it would appear that minorities are admitted at a rate up to x=2 times higher than the rates of other students (for MIT). This is roughly supported by the factor-of-two decline in under-represented minorities attending UCLA and UC Berkeley after California ended race-based admissions.

So, if I assume that under-represented minorities are equally qualified as other students, but are given favorable treatment, so that x=2, I get f’ = (1-0.22)/(1-0.22/2) 7% = 6%. So, in this scenario, there is roughly a 1% chance that a non-minority would be negatively impacted by affirmative action.

If I wanted to take a worst-case scenario and assume that minorities are not in fact equally qualified, then I need to do a different calculation. I must emphasize that this is not at all what I believe — this is simply to get a mathematical bound. Let’s pretend that an elite school suddenly decides to admit no minorities. The number of non-minorities admitted could then go up to n=N, and the effective applicant pool would shrink by the fraction that are minorities, so that m = (1-g/x)M. Therefore, f” = 1/(1-g)f = 1/(1-0.22/2) 7% = 8%. So, by changing the rates of admissions for minority students, I could conceivably change the rate of admission for non-minority students to an elite school like Harvard by as little as 1%, and at most 2%.

Fractionally, this is a significant change in the chance that an individual in the majority would get admitted. However, because the absolute chances of getting into Harvard are small, the actual chance that any individual non-minority would be affected by affirmative action is slight — a couple percent at most.

To put this in perspective, I looked into other forms of racial discrimination. A pair of economists did an experiment in 2004, in which they tried to judge the effects of discrimination by sending resumes to employers in the Chicago and Boston areas. The resumes were made to be identical, although half had stereotypically “white” names (such as Emily Walsh), and the other half had stereotypically African-American names (such as Lakisha Washington). The resumes with white names got callbacks in 10% of the cases, while those with African-American names got callbacks in only 6% of cases. So, the effect is about twice as large as the worst-case scenario for “reverse discrimination” under affirmative action.

What about looking at the most dramatic racial disparity in American life — prison populations? In 2008, a black man was 6.6 times more likely to be in prison than a white man, and a hispanic man was 2.4 times more likely to be in prison than a white man. Some of this has to do with the violence of the inner cities. However, a big part of the problem is the fact that, despite the fact that drug use rates are very similar for all racial groups, black men are sentenced for drug offenses at 13 times the rate of white men.

I have to mistrust the motives of anyone who decries “reverse discrimination,” but won’t spend equal breath bemoaning the remnants of racial discrimination in the work force. And if one wants to address those issues, it should be in small breaths taken between shouting about the big issue: that unequal law enforcement policies undermine black and hispanic communities, and deprive their children of opportunity.

Apparently, in the circles of conservative commentators and blog trolls, the claim has been going around that the Earth has been cooling in the last decade. Now, there are lots of places that one can go to find correct information on the web, so I’ve tended not to spend much time responding to ridiculous claims. However, I heard yesterday on NPR that the most popular global warming blog was written by a group denying global warming is caused by people. Clearly, more voices are needed to explain why most scientists do believe humans cause global warming.

figa2lrg
The suggestion that the Earth is cooling is a misrepresentation of the data (see the figure). Global average temperatures have increased by about
0.13&deg C per decade
over the past 50 years.

However, one should not expect to see this trend year-to-year, because yearly temperature measurements vary by about 0.1&deg to 0.2&deg C. These yearly temperature variations are caused by several things. For instance, El Nino tends to cause surface temperatures to rise, because it redistributes heat from the ocean into the atmosphere. Volcanic eruptions, such as the one from Mount Pinatubo in 1991, lower surface temperature by putting chemicals that reflect sunlight into the upper atmosphere.

As a result of the yearly variations, one should only expect to see a trend in global temperatures over timescales a decade or two. Statistically speaking, one can only see a trend in the data when the error in the mean over a long time period is about 3 times smaller than the trend. The error in the mean scales as the yearly errors divided by the square root of the number of measurements. If the yearly errors are equal to the trend, as is the case for global warming, then it would take about 10 years to measure the mean. Measuring a trend would take longer: 20-30 years. Indeed, in 1979, scientists were not sure that a global warming signal had been seen, because they could only look back a few decades. Thirty years later, in 2007, the trend was clear, because almost a hundred years of data was available.

What then of the last 10 years? First, the data looks like the temperature has been roughly constant. As I mentioned above, this is because 10 years is only enough to measure a mean, not a trend. There could not possibly be evidence that the Earth is cooling on a 10 year time scale.

Moreover, cherry-picking 1998 and 2008 to claim a global cooling signal, as I heard the conservative commentatorDeroy Murdock do on the Tavis Smiley show, is ignorant at best, and dishonest at worst. In terms of global temperature, 1998 was the hottest year on record, tied with 2005. It is thought that a strong El Nino effect made 1998 so hot. 2008 was “only” the eigth hottest year on record. Some basic familiarity with numbers would have made the conservative commentator realize that 2008 was cooler, so technically he was correct. However, 8 of the 9 hottest years measured were after 2000; 15 or 16 of the 20 hottest years were since 1990; and at most one of the 20 hottest years were since 1980. The last decade was hot!

Therefore, although the claim that the Earth was cooler in 2008 that 1998 is a true fact, I think that anyone who brings that into the global warming debate is warping the science, and defying logic.

I believe strongly that energy conservation is crucial toward securing our energy future, so you might think that I would be happy with last week’s announcement by the Obama administration that they will be implementing standards to make light bulbs more efficient. Instead, though, the way the announcement was made has bothered me, because its impact is actually pretty tiny.

The problem is, as Obama stated in his speech, that lighting only consumes about 7% of U.S. energy use. The new standards will not eliminate the energy used for lighting, it will only make it more efficient. How much more efficient? We can cut the press conference numbers Obama used down to size.

First, the speech stated that in the most optimistic analysis, the savings over 30 years are equivalent to powering all American homes for 10 months. That means we will be changing our residential energy use by (10/12)/30 = 2.8%. Moreover, residential uses account for only 21% of total U.S. energy use (according to the Energy Information Administration), so this plan should cut total U.S. energy use by only 0.6%.

Similarly, the speech states that over 30 years, it will be equivalent to taking 166 million cars off the road for one year. Why not phrase this as taking 5 million cars off the road for 30 years, or roughly 2% of the 250 million cars on the road? Well, with this number, the savings seem even smaller. Passenger vehicles account for 17% of U.S. energy use, so the savings may well be only 0.3%.

I’ve been reading the book, Sustainable Energy – Without the Hot Air by David MacKay, and he wittily explains the logical flaw in arguments like the one made in Obama’s speech. The problem is, Obama’s writers took paltry numbers and multiplied them by big ones, to make the impact seem bigger. The real effect has been put better by Prof. MacKay:

If everyone does a little, we’ll achieve only a little.

The cap-and-trade legislation that Obama has been helping through Capitol Hill will be much more effective (if Congress doesn’t get in the way too much). Unfortunately, reducing energy use at the consumers’ end will take serious broad-based efforts that are much bigger than changing light bulbs, and that I am only beginning to appreciate. . .

Tags:

The New York Times ran an editorial last week that caught my attention. The editorial was about a paper comparing the happiness of men and women in the US over the past 50 years or so. I was curious about how one would conduct a study, and whether it could actually reveal meaningful statistical solutions. I can’t access it at home, but I was able to get the original paper at work (during a break, of course!).

The paper was titled, “The Paradox of Declining Female Happiness” (by Betsey Stevenson and Justin Wolfers). The authors used three resources: the General Social Survey (1972-2006), the Virginia Slims Survey of American Women (about every 5 years between 1972 and 2000), and the Monitoring the Future 12th grade study.

Happiness in the General Social Survey, as reported by Stevenson and Wolfers (2009).

Happiness in the General Social Survey, as reported by Stevenson and Wolfers (2009).

I’ve reproduced the first figure, which seems to be the anchor of their argument. It is based on the General Social Survey. In the top half of the figure, the population was divided into three groups, those very happy, pretty happy, and not too happy. The authors notice a negative trend in the fraction of very happy women, from roughtly 40% to 33% over the last 35 years. The fraction of very happy men stays about the same, at around 33%. The figures don’t contain any error estimates for the individual points, but by eye the year-to-year variations would seem to imply that they are around 5% for the very happy fraction.

I am not familiar with the statistic that the authors apply, an “ordered probit regression”. A “probit” appears to be the inverse of a probability distribution, but I don’t know how that is applied to a regression. By my eye, though, any trends are slight, and probably not all that significant. The bottom figure makes no intuitive sense to me. The other figures in the paper are similar, although the trends are less pronounced. In the 12th grade survey (leaving alone whether 18 year olds can be thought of as “women” in anything but the legal and biological senses), the trend seems to be that boys get a bit happier, and girls stay equally happy.

The authors develop these data into a theme. They can split the data finer, and find similar trends in some of their categories of age, race, employment status, and marital status. They also look at European countries, and find consistent trends, although this is really because the uncertainties are so large that no clear trends are evident.

The only plot that showed any clear trend was in suicide rates for men and women: men have 4 times the suicide rate of women. However, that probably has more to do with the fact that men chose methods for suicide that are more likely to succeed.

In the New York Times editorial, Ross Douthat seemed impressed with how measured the prose of the article was in laying out possible reasons for this slight trend. I found myself cynically thinking, that a better title would have been “Men and Women are About Equally Happy.”

I am interested in statistics, but my knowledge is limited to astronomical applications. I would love for someone more knowledgeable to comment on the content of this paper. . .

Tags: ,