Statistics, Data Science, and Silver
I've commented in the past on the choice of Nate Silver as the invited speaker at this year's Joint Statistical Meeting. I have been very critical of Silver, mostly because of some choice words he made about modern statistics in his popular science book The Signal and the Noise.
Silver's address to the JSM has come and gone, and I have to say, I'm very impressed with how he handled the opportunity. Since I didn't attend the JSM, and the JSM don't seem to broadcast its presentations (for shame!), I have gotten my coverage of Silver's talk second hand. Let's take this post as a pretty good indication of what Silver chose to talk about.
First, I'm happy that Silver admitted he is not a statistician. I've written in the past about the difference between a statistician and someone who uses statistics. We need both sorts of people, just as we need mechanical engineers and car mechanics, and computer scientists and software engineers.
The rest of his talk addressed the role of statistics in journalism, which is certainly a topic Nate Silver knows a great deal about. Again, see this post at Revolutions for the eleven points Silver covered. Almost all of them are admonishments for those journalists who would also become quantitatively literate. This is great news, especially given the current state of the journalistic enterprise.
Apparently during the Q&A (mediated via Twitter!), Silver was asked for his thoughts about the possible distinction between data science and statistics. To my surprise, he commented that data science is a "sexed up" term for statistician. Which I'm of two minds about, given the divide I've decided to impose between statisticians and people who use statistics. At various times in the past year, I've made the claim that data science is the same thing as statistics, but I would also hesitate to claim that all data scientists are statisticians. And many of the more notable data scientists seem to share that hesitation. That's a strange contradiction to deal with: a field is really a sub-discipline of statistics, and yet its main practitioners are not statisticians.
I suppose this isn't too perplexing. Large parts of empirical science could be considered sub-branches of statistics. For instance, 'econometrics' is a fancy name for applying statistical models to economic problems. Psychometrics is another case of the exact same situation, except considering psychological measurements. Data science has just abstracted a field to anything that involves data (though 'data' itself is perhaps ill-defined), and thus must necessarily use statistics. (Since statistics is the branch of mathematical engineering that deals with noisy data coming from the real world.) But that doesn't make those people statisticians.
I've said in the past that I aspire to be a statistician, and I still stand by that aspiration. But I also aspire to hone my schlep skills so that I can answer cool questions about the real world that no one has thought to ask. There's certainly room for both sorts of skills, and no reason why a person with one should necessarily have the other. As this article explains, a PhD prepares you for doing research, but not necessarily for doing industry-scale data science.