Cosma's Greatest Hits
I had the good fortune to attend four lectures by Cosma Shalizi over the past two days. As anyone who has been so unlucky as to listen to me rant about statistics knows, Cosma is one of my academic idols1. I know him mostly through his research, his blog, and his pedagogical efforts. I've had the pleasure of corresponding with him via email a few times.
As a groupie of sorts, I noticed a few recurring topics in his talks that I'll record here for posterity. With tongue firmly in cheek, I'll call these Cosma's Greatest Hits. These are paraphrases, but I think they get the point across.
Introductory statistics and experimental methods courses are terrible. No wonder many scientists proudly flaunt their hatred of statistics. Modern statistics is cool. And even physicists should learn more about it.
MaxEnt isn't magic. And if you really want the unbiased, maximum entropy estimator, just use the data.
Big data without theory is lame. We can't defeat the curse of dimensionality with more data. Inferring things from high dimensional data is intrinsically hard, and we'll have to be clever to get anything useful out of the data.
\(R^2\) is a dumb metric of goodness of fit. If we care about prediction, why not report mean-squared error on a testing set2?
Linear regression for its own sake is silly. Unless we have a good theory supporting a linear model, we have better tools. The world is not linear: it's about time we started accepting that fact.
Bootstrapping is pretty damn cool.
Asymptotic goodness-of-fit estimators are nice. But at the end of the day, cross-validation works much better.
Machine learning is computational statistics that only cares about predictive ability.
Correlation isn't causation. But we have ways to get at causation now.
(He didn't get to talk about this, but...) Power laws aren't everywhere. And 'scientists' should stop fitting lines to log-log plots and learn some undergraduate statistics.
Some have even called him my 'math crush.'↩
In response to a question about how we should go about changing the culture around linear regression and \(R^{2}\), Cosma responded, "Well, it's not as if we can give The Elements of Statistical Learning to your clients and expect them to read it."↩