The Bayesian Conspiracy
We started reading about Markov Chain Monte Carlo (MCMC, as the kids call it) in a statistics reading group I attend. In the process of learning the MCMC material, I have decided to actually learn some Bayesian inference techniques1. I might write about the experience, though probably not. I can't imagine reading a generally pro-frequentist discuss the pros and cons of Bayesian methods would draw much of an audience. But I do have a few thoughts2.
Learning this material reminded me of my first exposure to Bayes' Theorem. I first read about it in this post by Eliezer Yudkowsky. In the post, he presents a lucid discussion of using Bayes' theorem in a, at this point, standard situation. Suppose you have a test for a disease. The test has a 99.9% true positive rate. That is, the probability that the test is positive, given that you have the disease, is 99.9%. This leads most people (myself included, at times) to believe that if you test positive, you must have the disease. But this is getting the probability backwards. What we want to know is the probability that we have the disease, given that we test positive. In general, conditional probabilities aren't symmetric, that is \(P(B | A) \neq P(A | B)\), so we're asking the wrong question. There are two ways to handle the fact that we're asking the wrong question. Eliezer's method is to develop an intuition. His post does this very well, using numerical examples. As he claims,
While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract.
I'll take the 'too abstract' route, inverting Laplace's suggestion, "probability is nothing but common sense reduced to calculation," and hoping that we can refine our common sense by applying probabilistic rules. I won't go into all (or even most) of the details. But here's the setup. Let \(+\) be the event that you test positive for the disease. Let \(D\) be the event that you have the disease, and \(D^{c}\) be the event that you don't have the disease. Then Bayes' magical formula is just:
\(P(D | +) = \frac{P(+ | D) P(D)}{P(+ | D) P(D) + P(+ | D^{c}) P(D^{c})}\)
In words: we have to combine the prior probability of the disease in the population with the probability that we test positive, given the disease (and normalize things so that our answer lies between 0 and 1, as all probabilities must). When I state the result like this, it's not all that impressive. And that's because it's not3. At this point, it's just a consequence of the definition of conditional probability and something called the law of total probability, which itself just says that the probability of something occurring is just the sum of the probabilities of all the different ways it could occur. Again, these are not deep results: you'll learn them in the first week of a graduate course in probability theory, after you get done proving a few results about \(\sigma\)-algebras.
And yet Yudkowsky and Nate Silver go around flaunting this formula as though it's the flagship of modern statistics. Imagine, something nearly 300 years old. And to young, impressionable youth4, this misleads.
Bayesian statistics is something else entirely. A valid approach to statistics5, sure, but it shouldn't be equated with Bayes' theorem. It's a completely different philosophical approach to applying probability to the real world. An approach you can take or leave. But again, Bayes' theorem has nothing to do with it.
I realize now that the title for this post has two meanings. The first, and original meaning, has to do with Eliezer's invocation of those who read his post in the 'Bayesian Conspiracy.' Which, again, as an impressionable youth, really drew me in. The other part of the conspiracy is that, in terms of Bayes' theorem, there's not much there there. Despite what Nate Silver would have you believe in his The Signal and the Noise, where he comments
Recently, however, some well-respected statisticians have begun to argue that frequentist statistics should no longer be taught to undergraduates. And some professions have considered banning Fisher's hypothesis test from their journals. In fact, if you read what's been written in the past ten years, it's hard to find anything that doesn't advocate a Bayesian approach. (p. 260)
Nevermind that William Briggs isn't an especially well-respected statistician (compared to, say, Cox, Efron, Tibsharani and Wasserman, all of whom still use frequentist methods because the methods work). Or that an arXiv paper doesn't make for an authoritative source. Or the fact that I really, truly, highly doubt that Silver has ever read an article from The Annals of Statistics.
Silver's story sounds spectacular. "The frequentists have had the wool over our eyes all this time! If we just switched to Bayesian methods, all would be well."
The truth, however, is a lot simpler. \(P\)-values fail, not because they are wrong headed, but because most scientists don't understand what they mean. As always, never attribute to malice that which is adequately explained by stupidity. This isn't a conspiracy. It's a failure of education.
A failure of education that Silver's (misinformed) slander doesn't improve upon.
Since MCMC is used a lot to sample from the posterior distribution that shows up in Bayesian inference. Of course, you don't need to use MCMC in a Bayesian setting. You might be interested in, say, developing a hydrogen bomb or performing numerical weather prediction.↩
My main thought, which I'll get out of the way now, is that the posterior 'distribution' is still random. That is, we evaluate \(f(\theta | x^{n})\) at our random sample, so the posterior is really \(f(\theta | X^{n})\), a stochastic process (for fixed \(\theta\), it is a random variable, and for fixed \(X^{n}\), it is a function). If we keep this in mind, using things like maximum a posteriori (MAP) estimates isn't so mysterious: they're still random variables, just like maximum likelihood estimates, with sampling distributions and everything.↩
It was, however, impressive when Thomas Bayes first proposed it. But that was 300 years ago. Get over it.↩
That is, me. I'm not really sure when I first read Yudkowsky's post. I know I was reading Overcoming Bias before it budded off Less Wrong, and that happened in 2009. If I had to guess, I probably started reading sometime after 2006 and before 2009. Let's say 2007.↩
Though, again, often presented as if it's the solution to all of humanity's woes. When, well, it isn't.↩