We are routinely bombarded by claims that have been “proven” with statistics. Today’s column offers tips in judging these claims.
- Surprising results get headlines. “Did you hear that hurricanes with female names are more deadly? Who knew?!!” An Internet search of this report from last week yields thousands of citations.
- “That’s why autism is on the rise!! It’s the vaccines!” The 1998 study making this claim got a lot more ink than The Lancet’s retraction, after the study’s publisher learned that the results were fraudulent.
Pure fabrication may be rare, but many studies are published with claims that should be served with many grains of salt. The first question to ask: “Is there enough data?”
- “Is there evidence that property values fall when school budgets are rejected by the voters?” Asked that question last week, I had to disappoint my caller: As relatively few school budget votes fail each year, the sample size is too small to test the connection between budget votes and home values.
Let’s consider the “deadly female hurricane” finding. When I was growing up, all hurricanes had female names. So by necessity the study starts with the period following 1978 when male names went into the lineup—that’s 53 hurricanes. As the study excluded Katrina, the researchers were left with 52 instances in which a hurricane could be assigned either a male or a female name. If this were a simple event—the only outcome is the death toll and the only variable is the gender of the assigned name—then it is possible that you could draw a statistically valid conclusion from 52 observations.
But a hurricane is anything but a simple event. The death toll is dependent on many factors—the intensity of the storm, the population density of the affected communities, the time of day when the hurricane hit the community at greatest risk, whether the hurricane’s course was predicted in time, the geographic extent of the storm, and so on. The gender of the storm’s name may be one factor affecting the death toll—but it is only one among many. For the same explanatory power, more variables require more observations. The researchers’ hypothesis could be perfectly correct. But with so little data, the association is simply unsupportable. The University of Illinois researchers insist that the study was not intended to be a joke. It should have been.
We care about the size of the sample because statistics is all about relative probability—how likely is it that what we observe could have occurred by chance?
Suppose you speculate that cars painted white are more visible to other drivers, thus are involved in fewer accidents. You compile car color for the 110 accidents that occurred in New Hampshire in 2009 and discover that 10% of vehicles in accidents were painted white. If 15% of the cars registered in New Hampshire are white, the evidence supports your hypothesis.
So is the matter proven? No—in 110 accidents, it would not be that surprising for the white-car-accident rate to be small just by chance. Encouraged by the NH result, you study all 33,808 motor vehicle accidents that occurred nationwide in 2009 and get the same result. That certainly strengthens your case.
So NOW is the matter proven? Nothing is ever “proven” to a statistician: But with a sample of 33,808 accidents, a substantially lower accident rate for white cars—just by chance—is very unlikely.
Statisticians turn “guilt by association” into science. The probability of a particular association occurring by chance is reported according to levels of “significance.” A finding reported at the 10% significance level, for example, is an event that will have occurred by chance 10 times out of 100. The 5% significance level means that the observed result would have occurred by chance just 5 times out of a 100. The 10 or 5 random associations—that have nothing to do with causation—are called “false positives,” e.g. it appears that there is a connection, but it doesn’t really exist.
In this era of “big data,” expect that studies of disease will mine new data sets for associations between specific conditions or diseases and other characteristics of the population such as diet or lifestyle. We’ve learned a lot from this approach in the past—and we have high hopes for the future.
But be aware of the risk. Suppose we want to know what factors are associated with rheumatoid arthritis (RA). If we are testing at the 10% significance level, we can expect that 10 of 100 of our “explanatory” variables will appear to be associated with RA just by chance. At the 5% level, 5 of every 100 “promising” associations will turn out to be a dead end. Thus we may explore a dataset in which RA is statistically associated with a high protein diet (e.g. Atkins or paleo) or growing up in an older home. Data “mining” can be helpful because some of the observed associations are “true” and may lead to a new understanding of the cause of disease. But we know in advance that some of the correlations are “spurious.”
Good science is based on replication—if, say, RA and an older homestead are associated in one data set, this should simply spur further study, not a press release. The vaccine scare illustrates the damage that can be done by the premature release of unconfirmed results. 2014 has seen the highest incidence of measles in the United States in 18 years. The World Health Organization reports that the incidence of measles in Europe was up 348% between 2007 and 2013.
Tyler Vigen, a Harvard law student, has posted a collection of statistical associations for our amusement (see http://www.tylervigen.com/): Want to reduce divorce in Maine? It’s all about margarine. Troubled about the mysterious die-off of bee colonies? Look no further than juvenile arrests for marijuana possession.
As “big data” brings us access to ever more numbers, let’s remember that statistical correlation is just math. Correlation is a beginning, not a conclusion.