Down with Coefficients of Variation! Or, How to Catch a Predator with Statistics

We attended a talk by David Eagleman as part of the Santa Fe Institute's lecture series. Eagleman is a neuroscientist at Baylor College of Medicine. He gave a brilliant talk on the implications of our growing knowledge about neuroscience for law. The Santa Fe version isn't up yet, but this talk looks more or less identical.

Towards the end of his talk, he discussed recidivism rates for sexual offenders as a prediction problem. Of course, once you start doing prediction based on data, you're well into the domain of statistics (or, if you're a computer scientist, machine learning). He proposed various covariates (features) that might be useful as predictors for recidivism. It was refreshing to hear him talk about the regression question (really a classification question) as a prediction problem instead of a causal problem. Because we all know that correlation is not causation, and that regression can only answer prediction problems.

This is all fine, but then the analysis took a turn-for-the-typical. He discussed the predictive ability of psychiatrists / parole officers vs. a statistical algorithm, and concluded that the first group does terribly compared to a statistical approach. This wouldn't surprise me. But the way he decided to measure how well either did was by using \(r\), the coefficient of determination. As we all known, this is a terrible way to go about measuring the performance of a predictive algorithm. If you want to know how well an algorithm will predict compared to a person, why not report the accuracy rate on a test set? Of course, this requires the researcher to make sure to build their model on a training set, and then perform testing on a held out (i.e. not used in the training of the model) test set. After all, in this case we're interested in how well we'll be able to predict recidivism on a new individual, not how well we did on a set of individuals we already know the answer to. This (very simple) idea from statistics and machine learning hasn't seemed to enter the lexicon of researchers in psychology, sociology, economics, etc.

Unfortunately, the version of the talk I managed to find doesn't include the statistic on how parole officers and psychiatrists perform. But I did manage to find one of the papers that presents a statistical prediction tool for recidivism. The researchers use discriminant analysis, presumably of the linear variety (I don't think they say). They do model selection, using a step-wise removal of variables. Which is pretty good for 1993. Now we have better tools, like the LASSO. They even apply a Bonferroni correction to handle multiple testing.

They do eventually comment on their accuracy rate (about 75%), and comment on more important things like false positive (people the model predicted would commit crimes, but didn't) and false negative (people the model predicted wouldn't commit crimes, but did) rates. But they report these values only for in-sample predictions. That is, they never do the ever so important step of holding aside a test set to make sure they haven't fooled themselves.

Overall, these researchers are doing some sophisticated work. I can't blame psychologists for tripping up on a few important statistical points1. I just want to point out that, if it should ever be the case that these sort of statistical tools are used to predict recidivism (which, given the dismal performance of parole officers and jurors, I would hope for), we should make sure we have meaningful numbers to back up their use.

  1. Those points being, in no particular order: the failure to test their predictor on a held out set, the unnecessary use of the correlation coefficient \(r\), the use of a linear classification method (though this has more to do with the era this study was done).