All I Really Need for Data Analysis I Learned in Kindergarten — Means, Medians, and Modes
One sometimes hears, 'We can't have everyone be above average!' Especially when some new education policy comes to the table. (See the humorous 'No Child Left Behind.') This statement is true.
But sometimes, folks make a very similar statement that seems like it should follow the same argument. For example, the latest Freakonomics podcast made this mistake1:
DUBNER: Well, that is a common sentiment. But the fact is that most of us don't drive anywhere near as safely as we think. Get this Kai, about 80 percent of drivers rate themselves above average, which is, of course, statistically not possible. And believe me, if we found out that human error by, let's say, public-radio hosts was causing 1 million deaths worldwide — my friend Kai, I would replace you with a computer in a heartbeat.
(Emphasis mine.)
If we lived in a world that only allowed symmetric densities, the Freakonomics folks would be right. But Dubner makes the error of conflating the median of a distribution with its mean. These two things are equivalent for symmetric densities2, but for a generic density they need not be.
For a quick refresher, if the density of a random variable X is given by f(x), then it's mean (or expected value) is given by E[X]=∫Rxf(x)dx.
Take for example a random variable X distributed according to the Gamma distribution, with density f(x) given by f(x)={1Γ(k)θkxk−1e−xθ:x≥00:x<0.
We didn't have to work very hard to make Dubner's impossibility a reality. While we learn about means, medians, and modes4 in middle school, it doesn't hurt to come back to them once we have a few more tools in our toolbox.
Then again, the Freakonomics folks aren't exactly known for their mathematical rigor. Or even their belief that math comes in handy for science.↩
Assuming the mean exists. See the Cauchy distribution for a symmetric density that doesn't have a mean, but does have a well-defined median.↩
The mean has a nice closed-form expression. The median requires a nasty-ish integral, so we have to solve F(x)=12 numerically.↩
For completeness, a mode (which need not be unique) of a distribution is a maximum of the function f(x). Thus why we talk about distributions being bimodal. (This sort of behavior is common to see, for example, in the distribution of exam grades.)↩