High School Chemistry — Accuracy and Precision

High school chemistry was the first time I heard the words 'accuracy' and 'precision' in a scientific context. The terms were presented by way of a bullseye example (more on that later), and the Wikipedia article on the topic captures the gist pretty well:

In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual (true) value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words reproducibility and repeatability can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method.

See the Wikipedia article for the standard bullseye figures.

At the time (some nine years ago now), my friends and I would frequently get confused about which was which. Did precision relate to how close to the bullseye you were? Did accuracy relate to how closely grouped the points were? The names themselves didn't really clarify anything. Precise and accurate sound, to the teenage ear, indistinguishable.

While reading through Nate Silver's The Signal and the Noise, I had the (strikingly obvious) realization that these terms could easily be framed in probabilistic language¹. I don't think 10th-grade-me would have understood this framing. But that's his fault, not mine.

Here's the setup. Suppose we have some true value, \(\theta\), that we would like to measure. Maybe it's the speed of light in a vacuum, or the volume of water in our volumetric flask. Call our measurement of that value on a single trial \(\mathbf{M}\), and take \(\mathbf{M}\) to be given by

\[\mathbf{M} = \theta + \mathbf{N}.\]

Here, \(\mathbf{N}\) is 'noise.' For example, it might be noise due to measurement apparatus. Basically, \(\mathbf{N}\) will capture all of the things that could go wrong in the process of making the experimental measurement.

Let's choose to model \(\mathbf{N}\) in a simple way. I do this because it makes operationalizing precision and accuracy much easier. We'll assume that

\[\mathbf{N} \sim \text{Normal}(\mathbf{b}, \Sigma)\]

(Hold for all of the comments about why Normal theory is the devil.)

Okay, so we're assuming that the errors in our measurement are normal with mean \(\mathbf{b}\) and covariance \(\Sigma\). And now we're done. The precision of our measurement \(\mathbf{M}\) has to do with how 'small' \(\Sigma\) is. So that I can use the bullseye example, \(\mathbf{M}\) is a vector in \(\mathbb{R}^{2}\), so \(\Sigma\) is a \(2 \times 2\) matrix of covariances

\[(\Sigma)_{ij} = \text{Cov}(N_{i}, N_{j}).\]

This shows us that precision is a relative term: you can always be more or less precise, depending on how small you can make \(\text{Var}(N_{1})\) and \(\text{Var}(N_{2})\).

Accuracy, on the other hand, has to do with \(\mathbf{b}\), the suggestively named bias of our measurement process. If \(\mathbf{N} \sim \text{Normal}(\mathbf{b}, \Sigma)\), then \(\mathbf{M} \sim \text{Normal}(\theta + \mathbf{b}, \Sigma)\). In other words, \(\mathbf{M}\) picks up the bias in our measurement apparatus. No amount of averaging of the measurements will get us the right value. Sorry Law of Large Numbers, we'll always be off from \(\theta\) (on average) by \(\mathbf{b}\). Thus, accuracy is absolute (according to our model): we're either accurate, or we're not.

Now some pictures to clarify this result. We stick with the bullseye example, because it really is nice. As the Wikipedia article states, we can be:

precise and accurate, which means that \(\Sigma\) is small and \(\mathbf{b} = \mathbf{0}\)
precise and not accurate, which means that \(\Sigma\) is small and \(\mathbf{b} \neq \mathbf{0}\)
not precise and accurate, which means that \(\Sigma\) is large and \(\mathbf{b} = \mathbf{0}\)
not precise and not accurate, which means that \(\Sigma\) is large and \(\mathbf{b} \neq \mathbf{0}\)

For some reason, the standard textbooks (i.e. the textbooks I learned high school chemistry from) focus on only a few of these cases. All four cases, in the order above, are presented below.

So there we have it: a simple statistical operationalization of the (often hard to remember) terms precision and accuracy. Another favorite topic of mine from my days as a chemist-in-training are propagation of error computations. Those again become easy to deal with when we reframe the computation in terms of probability. Otherwise, they're a hodgepodge of non-obvious heuristics to look up in a lab manual².

Fun fact of the day: Galileo was one of the first scientists to propose averaging as a method of getting more correct (in the sense of accurate) measurements from a noisy scientific apparatus. Before him, a lot of scientists cherry-picked the outcomes that seemed most reasonable. I'm sure a lot of scientists still do.↩
This was more or less how they were treated in my college chemistry lab courses. Which is fair: there's barely enough time to teach chemistry without instilling a prerequisite knowledge of probability and statistics. Especially given the (surprising) distain for mathematics by many chemistry majors.↩