Statistical error

Tom Siegfried, the editor of Science News, has published a blistering indictment of statistical methods in science and medicine. I am moved to speak for the defense.

Siegfried discusses a number of specific cases, mainly drawn from the biomedical literature, where faulty statistical reasoning has led to unreliable or erroneous conclusions. I don’t want to quibble over the particulars of those cases; I’ll concede that science provides plentiful examples of statistical analyses gone wrong. Indeed, I could add to Siegfried’s list. But I see these events mainly as failures to use the tools of statistics properly; Siegfried suggests that the problem goes deeper. If I understand him correctly, he believes that the tools themselves are defective and that science would be better off without statistics. Here is how he begins his essay:

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.

This argument strikes me as so totally wrong-headed that I have a hard time believing Siegfried is serious about it.

To begin with, the snide remarks about Las Vegas and crapshoots are off-target. The branch of mathematics with roots in the study of gambling is not statistics but the theory of probability. The two fields are closely allied, but they’re not identical. Early statistical ideas came out of astronomy and geodesy, with later developments in the social sciences, genetics and agriculture. If you really must find some vaguely disreputable locale for statistics, the apt choice is not the casino but the brewery (a notable Student of statistics worked for Guinness).

More disturbing than this minor historical flub is Siegfried’s vision of a lost golden age of “rigorous mathematical methods,” debased by the seductive wiles of statistics. I don’t believe there was any such fall from grace. Siegfried doesn’t tell us much about the nature of his prestatistical mathematical paradise, but since he mentions Galileo and Newton, I suppose he may be thinking of classical mechanics as an exemplar of lost innocence. It’s true that the study of planetary orbits and ballistic trajectories does offer up some pithy mathematical laws that purport to be exact descriptions of nature:

mechanics-eqns.png

We don’t usually attach error bars to these expressions, or hedge our bets by saying “Force is equal to mass times acceleration within one standard deviation.” But where do such “exact” laws come from? When Galileo performed his experiments with balls rolling down an inclined plane, the measured data did not exactly conform to a parabolic trajectory. Likewise with Newton’s inverse-square law: No real-world observations precisely follow the form 1/r2–not unless the experiment has been fudged. Making the leap from experimental data to mathematical law requires a process of statistical inference, where we extract some plausible model from the data and attribute any departures from the model to measurement error.

In the time of Galileo and Newton, tools for statistical inference were crude; by the time of Gauss, they were much sharper. In 1801 the newly discovered planetoid Ceres was observed for 41 days before it was lost in the glare of the sun. Astronomers hustled to predict where and when it would reappear in the sky. Among all the attempts, the clear winner was the prediction of Gauss, whose advantage in this competition was not so much superior astronomy as superior statistics. His secret was the method of least squares, which he later backed up with a comprehensive theory of measurement error, introducing the idea of the normal distribution.

Later still, statistics had a role in showing that the “exact” mathematics of Newtonian celestial mechanics is not exact after all. It took careful observations–and careful statistical analysis of those observations–to quantify a tiny anomalous precession in the perihelion of Mercury, explained by general relativity but not by classical gravitation.

Statistics is no “mutant form of math”; it’s the way that science answers the fundamental and inescapable question, “How do we know what is true?” I really can’t imagine how science could survive without statistics. What would replace it? Divination?

Siegfried complains that statistical tools offer no certainty–that when a result is reported as statistically significant at the 1-sigma 2-sigma level (or in other words with a P value of 0.05), there’s still a 1-in-20 chance that it’s a meaningless fluke. Quite so; that’s essentially the definition of a 2-sigma P value. But the uncertainty is not some methodological malfunction. It reflects the true limits of our knowledge. The strength of statistical reasoning is that it makes those limits explicit.

Again, I’ll readily agree that standards of statistical practice should be strengthened, and that weak or faulty conclusions are too common in some areas of the published literature. But the claim that “any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical” is, in essence, illogical. Yes, we have lies, damn lies and statistics. But we also have lies and damn lies about statistics.

This entry was posted in statistics.

9 Responses to Statistical error

  1. John says:

    I agree with the main point of your article. As Fred Mosteller said “It’s easy to lie with statistics. It’s easier to lie without them.”

    But I believe there’s plenty of room for improvement in the kinds of statistical methods we use. Traditional statistics is stretched to its limits or beyond by some new types of data.

  2. Ian says:

    I did not read the first article on which you are writing. However I think he was probably not slamming stats but rather some scientists sloppy use of stats.

  3. Frak says:

    I find it amusingly ironic that your critique focuses on the first 3 paragraphs — just the lead-in — and disregards most of the article’s 60+ paragraphs. Are you applying some sort of statistical 5% rule here?

    On a more serious note: Siegfried’s lead-in does make several inflammatory statements. The rest of the article is factual and discusses well-known mis-uses of statistical methods, including summaries and citations of more thorough research on the issue. I do not see how someone reading the article in its entirety could take your critique seriously.

  4. Kaiser says:

    Brian: Agree with your take… I have a similar critique on my blog. http://bit.ly/bgSqnZ

    Frak: the lead-in of an article lays out the argument the author is trying to make, and is absolutely the most important part of any piece. If the rest of the article is inconsistent with the lead-in (which I agree is true in many spots), then the author has failed to support his thesis.
    No one is saying that statistical methods do not have limitations. The so-called “facts” in the article are subject to similar limitations, and in most cases, he only gave one side of the issue. I find the analysis to be superficial – and he offers no semblance of a solution.

  5. David S. Mazel says:

    A thank you to Brian for pointing out the article. I come to this blog regularly and enjoy Brian’s writings as well as the links he provides.

    I read Brian’s post as well as the article and frankly I’m not surprised by either. The reality is that statistics and probability are not well understood, even by experts. And, as Mr. Siegfried notes, many scientists who compute statistics haven’t the faintest idea of what they are doing or why. They run some software, get a result, and write a report. The public is expected and encouraged to accept the result.

    The reality is that analysts (and I am one and have worked on system testing) use statistics because it was mandated somewhere by someone to do so. The applicability and validity are not important; what’s important is to get a number. Further, the distributions on which statistics rest are completely disregarded in most studies. You simply have to read reports to see this and Mr. Siegfried provides ample examples both of mis-applied statistics and wrong interpretations of results.

    To see that probability is so greatly misunderstood one has only to look at the Monty Hall problem. Wikipedia has a good entry on this and, frankly, the results are counter-intuitive such that mathematicians have argued about it furiously. To see more of this, there are articles in, I think, the current issue of either Mathematics Magazine or the College Mathematics Journal. Plus, the Mathematical Intelligensia just had an article about Bayesian methods that also shows the counter-intuitive nature of this approach.

    With reference to statistics, we can see the shortcomings in a simple example if we look at a data, say some sampling, and compute (blindly) the mean and standard deviation. These calculations are simple to do, but to use them as a descriptor means the data have to be Gaussian and sometimes data are and sometimes they are not. Yet, how often do we see a plot of data with the Gaussian overlaid so that one can see (and with analysis, know) if the Gaussian is the right distribution. Not often. (For that matter, how often does a reader even have the raw data for his/her own analysis?)

    In short, Mr. Siegfried is doing science a service by pointing out the issues with statistics and probability. Hopefully, scientists will be more careful to report just what a statistic implies AND what it does not imply.

    Brian, you, too, are doing us all a service with your posts and comments.

  6. Jim Ward says:

    Is it science’s failure to come up with new physical laws? Why are medicine and economics still relying on statistics?

  7. David Eisner says:

    I shouldn’t be surprised that we can read the same words and come to entirely different conclusions about the intended meaning. My original interpretation of Siegfried’s article (which I read before your blog post) was that by “mutant form of math” he meant the “widespread misuse of statistical methods.”

    Consider the modifying phrase in this sentence: “Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted.” To me this demonstrates that, up to this point, Siegried is criticizing the *incorrect* use of statistical tests. That is, there are two problems: a) incorrect application of statistical methods, and b) misinterpretation of correctly applied methods. Neither is an indictment of statistics per se. The rest of the article seems to confirm this interpretation.

    One of the money grafs: “Correctly phrased, experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.””

    This has tripped me up, too. It’s a subtle point.

    By the way, you write “when a result is reported as statistically significant at the 1-sigma level (or in other words with a P value of 0.05)” — isn’t the 1-sigma level closer to 68%, not 95%?

  8. brian says:

    @David Eisner: Oops! Make that 2-sigma, not 1-sigma. And note that the error is all mine, not Siegfried’s.

  9. 0x69 says:

    Regarding Tom Siegfried view about “rigorous mathematical methods”. Somehow i think that he did not thought about classical mechanics, but rather about mathematical methods involved in classical mechanics and in that time math in general. Let me explain.
    To my knowledge, Galileo was fascinated by the works of Descartes which in turn helped Newton and Leibniz to create calculus. Indeed, classical physics methods are calculus based. For example, well known F=ma, can be written as body momentum derivative over time F = dp/dt. So, that being said, I rather think that Siegfried’s “mathematical paradise” is … calculus. But I can be wrong about his view.

    Regarding statistics – I agree, in one way or another we need statistical methods. And there are cases, where without statistics we could say nothing about problem at hand. For example,- I can’t imagine better alternative to Maxwell distribution – which helps to predict properties of gases->
    http://en.wikipedia.org/wiki/Maxwell%E2%80%93Boltzmann_distribution
    According to wikipedia- Maxwell distribution was first-ever statistical law in physics. After that the whole new physics field based on statistics borned ->
    http://en.wikipedia.org/wiki/Statistical_mechanics
    And there are a lot of physical theories which uses statistical or probabilistic approach, take a look at quantum mechanics or chaos theory… Without statistics whole fields of physics should be destroyed, which would be ridiculous given the fact that these fields gave remarkable ideas and new technological inventions. So statistics should survive.