Cosma's Greatest Hits
I had the good fortune to attend four lectures by Cosma Shalizi over the past two days. As anyone who has been so unlucky as to listen to me rant about statistics knows, Cosma is one of my academic idols1. I know him mostly through his research, his blog, and his pedagogical efforts. I've had the pleasure of corresponding with him via email a few times.
As a groupie of sorts, I noticed a few recurring topics in his talks that I'll record here for posterity. With tongue firmly in cheek, I'll call these Cosma's Greatest Hits. These are paraphrases, but I think they get the point across.
Introductory statistics and experimental methods courses are terrible. No wonder many scientists proudly flaunt their hatred of statistics. Modern statistics is cool. And even physicists should learn more about it.
Big data without theory is lame. We can't defeat the curse of dimensionality with more data. Inferring things from high dimensional data is intrinsically hard, and we'll have to be clever to get anything useful out of the data.
Linear regression for its own sake is silly. Unless we have a good theory supporting a linear model, we have better tools. The world is not linear: it's about time we started accepting that fact.
Bootstrapping is pretty damn cool.
Asymptotic goodness-of-fit estimators are nice. But at the end of the day, cross-validation works much better.
Machine learning is computational statistics that only cares about predictive ability.
Correlation isn't causation. But we have ways to get at causation now.