Base Rates, Conservative Pundits, Engineers, and Terrorists
Attention Conservation Notice: I explain something everyone learns in an introductory stats class, while using potentially controversial examples because they're currently in the news.
First watch this clip from The Colbert Report on Fox News's reaction to the Trayvon Martin backlash.
Now you can skip the rest of my post, because Colbert nails it. Some of the statistics quoted by the pundits on Fox News:
"Young black men commit homicides at a rate 10 times greater than whites and Hispanics combined." - Bill O'Reilly
"African Americans make up 13% of the population but commit more than half of all murders." - Some angry white dude
Colbert points out that of 42 million African Americans in the US, only 4149 were arrested for murder in 2011. That comes out to 0.00987% of African Americans. Which, in case you didn't notice, is a very small number.
Both 'sides' (though I don't know if Colbert counts as a side) are using statistics. And (one would hope that) both sides are using valid statistics. So why the discrepancy? Because we have to think about what question we'd like to answer, and also how these numbers fit into answering that question.
Colbert is reporting something akin to \(P(\text{Murderer} \ \ | \ \ \text{African American})\). This is presumably the probability someone living in Florida should care about while deciding whether or not to pull a gun on someone. Colbert points out that this probability is very, very low. Most African Americans are not murderers. (The fact that we have to point this is out is very, very sad.)
The pundits are reporting something akin to \(P(\text{African American} \ \ | \ \ \text{Murderer})\), which should not be what you care about when you're in a dark alley. Of course, these two things are related to each other by the definition of conditional probability (which, for reasons of historical accident, people still call "Bayes's Theorem"),
\[\begin{align} P(\text{African American} \ \ | \ \ \text{Murderer}) &= \frac{P(\text{Murderer} \ \ | \ \ \text{African American}) P(\text{Murderer})}{P(\text{African American})} \\ & \propto P(\text{Murderer} \ \ | \ \ \text{African American}) P(\text{Murderer}) \end{align}\]
It's that pesky \(P(\text{Murderer})\) that ruins the pundits (terrible, terrible) argument. The base rate of murderers in the US is, well, pretty dang low. We just don't go around killing each other in the US. And we're doing it less and less.
This sort of wrong-headed thinking came up in another story shared with me: There's a Good Reason Why So Many Terrorists Are Engineers. Gambetta and Hertog, the author's of the article cited in this 'news' story, perform the exact same statistical slight-of-hand. The main result they present, namely, the probability of a person being an engineer, given that they're a terrorist, is not what the authors are arguing, namely that the probability of a person being a terrorist, given that they're an engineer, is high (for some definition of high). Again, this is the same 'proof' we did above, exchanging 'African American' for 'Engineer' and 'Murderer' for 'Terrorist'.
If I had to make a speculation (based on the number of Middle Eastern students getting PhDs in engineering at UMD), I would guess that, of the people who get undergraduate and graduate degrees in Muslim countries, engineers are highly overrepresented. Gambetta and Hertog don't address this directly in their paper. Instead, they look at the percentage of engineers in the entire labor force (not just those who have a college degree or higher). This is obviously not the same thing, but they pass it off as if it is.