The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives by Stephen Thomas Ziliak
My rating: 5 of 5 stars
Why Statistics (and Economics) Are So Unpleasant
Recently in another GR review I made a remark about the absence of a coherent aesthetic in statistics. In response I received several messages criticising my choice of words. ‘What does aesthetics have to do with the practical use of statistical analysis,’ was the polite form of the question. The answer is ‘quite a bit really.’ In fact it is the lack of aesthetic coherence which makes statistical analysis so dangerous as well as wrong. McCloskey’s book, by demonstrating the utter arbitrariness of statistics, gives part of problem from the point of view of economics. But the issue is actually more profound than she sets out.
Many years ago I had a teacher, Russell Ackoff, at the University of Pennsylvania. Early in his career he had written a textbook called Scientific Method: Optimizing Applied Research Decisions. In it he laconically criticised social scientists for using tests of statistical significance which were unrelated to the cost of error in their analysis. His point was obvious to all but statisticians: tiny errors affect consequential decisions far more than large errors affect trivial ones. If you don’t know what the consequences of being wrong are, you can’t conduct responsible statistical analysis.
No one paid much attention to Ackoff in 1962. And I doubt whether many have paid much attention to McCloskey since her book’s publication in 2008. One possible reason for the lack of impact is that both Ackoff and McCloskey present statistical method as an economic issue. While the issue is indeed in part economic it is not one that the discipline of economics chooses to address. To do so would undermine most of the empirical research by the discipline for the last century. Besides any kind of cost benefit analysis of statistics has to be carried out through the same statistical techniques that are being held to account by those who are most invested in them. Prospects for a breakthrough have always been slim therefore.
A fundamental term in all economics is ‘value.’ Classical economics derives value from what it calls preferences, typically expressed in terms of what it calls ‘utility’ (a transparently aesthetic concept), or in the case of business, ‘profit’ (less obviously so, but also an aesthetic idea). In order to make its scholastic logic work, it fixes preferences and its derived factors as some sort of divinely revealed and protected order of things, regardless of the obvious fact that preferences change continuously and no one actually has any definite idea of what constitutes corporate profits.
In fact ‘value’ is actually not fundamentally an economic concept but an aesthetic one. That is to say economic value is a sub-species of the aesthetic. More specifically, value is a criterion of aesthetic choice as it is employed in economics. An ‘aesthetic’ is the more general term for such a criterion and applies not just to economics but to all choices of consequence. An aesthetic is more than a preference, it is a rule of choice about what is more important, more desirable, more inherently valuable. It is also articulated to some degree, not necessarily by the one employing it, not merely a response to the presentation of alternative courses of action. And through its articulation an aesthetic evolves continuously.
From an aesthetic perspective, therefore, the statistical problem of economics is no longer one of some sort of complex, quite likely impossible, cost/benefit analysis. Rather it is one of aesthetic compatibility - or lack of it. This is easy to understand without complex analysis about the significance of significance tests or the costs of being wrong. Only the most elementary understanding of what statistics are is necessary.
A statistic is a partial description of some set of numbers, usually the result of some sort of social science observation or experiment. I say ‘partial’ because there are an infinite number of statistics that might be used to describe any such set of numbers. When it is used to judge, that is to evaluate, a set of numbers, a statistic is an aesthetic.
The most common statistic is the mean, or average, of the set. The mean is what is called the first order statistic. The second order statistic is that of the standard deviation, or variance, of the spread within the set of numbers. The greater the variance, the less certainty there is about the mean of the set. Much of statistical analysis is directed toward establishing just how certain, how significant, a given statistic is.
But it is crucial to recognise that the mean and the variance are only two possible statistics. Many others may be relevant to a meaningful description of the set of numbers. The third order statistic, called skew, for example captures the prevalence of the extremes in the set. Skew is highly relevant for estimating things like ‘worst case’ conditions, what happens if things really go wrong. The fourth order static is that of heteroskedasticity, not something one reads about on the back of a box of cereal but important for understanding how the numbers in the set might defined in each other.
There are fifth, sixth, seventh, and higher order statistics, each giving slightly more information about the set of numbers in hand. And each of potential relevance in arriving at a decision about what to do. The mean expected return on a business venture for example might be very high. Its variance might be low as well so it looks like a winner. But if its downside extreme, its skew, could bankrupt the whole company, perhaps it’s not such a good idea.
And that’s the nub of the aesthetic problem in statistics: how much mean is worth how much variance, is worth how much skew, etc., etc.? There certainly is no rational, objective, universal answer, no criterion for equating various statistics. Nor even is there a criterion, a rule of judgement, for establishing which descriptive statistics are relevant at all. Each order statistic is a distinct aesthetic which is completely independent of and contrary to every other statistic. Nothing in the science of statistics suggests even the possibility that these contrary aesthetics are theoretically or practically reconcilable. And no statistician nor economist has ever been bold enough to propose that there might be such a reconciliation. They choose instead to duck the issue completely.
Furthermore, from my own experience I can attest that any attempt to establish subjective preferences for ‘trading off’ the various statistics against each other, in the manner of a utility function for example, is doomed to failure. Not only do people not experience life (including business) in terms of statistical categories, any such preferences are altered as soon as they are articulated. The end result is contradictory and nonsensical. No one recognises much less believes the resulting aesthetic.
In short, the insignificance of statistical significance can be most effectively demonstrated not by the fact the measures of significance are unrelated to the cost of error but by the fact that the statistics used in analysis have no significance to each other. I challenge any statistician or economist to show that this conclusion is unwarranted. Nevertheless I defer to Deirdre if she insists in making the case for cost.
Postscript: The approach outlined above is, I believe, also one that is compatible with the growing trend called Behavioural Economics. Traditional or Classical Economics starts out from a set of first principles and then logically derives its conclusions as virtual economic ‘laws.’ One law for example is that consumers and corporate executives are ‘rational’, that is, they act according to the implications of first principles. If they are empirically observed not to act this way, they are irrational and will eventually be forced to conform to rational norms or they will be economically punished.
This method of traditional economics has a marked resemblance to medieval scholastic method. It is not rational but rationalistic. It presumes the validity of its own analysis and has no real way of testing itself or learning. Behavioural Economics is equivalent to the Baconian revolution in medieval thought. It seeks first the way in which people conduct themselves economically, and then posits the question ‘Why.’ In other words, Behavioural Economics, presumes that there is a possibly undisclosed rationale, a purpose, and then tries to articulate what that purpose might be.
It is perhaps in the area of Financial Economics that the use of statistics is most abused and most dangerous. One consequence was the financial crash of 2008, created directly by the use of statistical models using fundamentally incompatible aesthetics. As far as I am aware many people are still paid a great deal of money to continue doing the same thing. This behaviour is driven by the aesthetic of the Goldman Sachs’s of the world, who have persuaded many more that they’re acting in their interests. They don’t.
View all my reviews