# Delta, Taylor, and Physical Chemistry — How do errors propogate?

I took Physical Chemistry my junior year of undergrad^{1}, basically a watered down course in classical thermodynamics with a modicum of statistical mechanics. The lecture portion of the course was fantastic. The lab portion, less so. (There's a reason I become a computational scientist.) We spent hours measuring out reagents, timing reactions, and computing equilibrium constants. *Why* we did all of those things are beyond me. I think mostly to learn about proper lab technique.

After we finished up in the wet lab, the next step was to write up a lab report. For your general amusement, here's an example report on the chemical kinetics of the reaction between 2,4-dinitrochlorobenzene and piperidine^{2}. I hated writing those, as I now hate writing journal articles.

As a part of the lab writing process, though not in the lab above, we had to do this terrible thing called 'propagation of errors.' Since we're fallen beings and our instruments are faulty, we worry about the inaccuracies in our measurements. We usually have some idea about how inaccurate our measurements are. The lore goes something like this: if you have a ruler that only measures to the nearest millimeter, you're uncertainty in the measurement is around 0.5 millimeters. However, we would also like to *derive* things from our measurements. (We are trying to do *science* after all, and not big data.) And thus, we need to have some idea about how uncertain we should be about our derived quantities.

And this is where propagation of errors comes in. At the time (a mere seven years ago), propagation of error formulas seemed like the sort of thing best left to tables. How could anyone derive these themselves? I had vague notions that the formulas seemed a lot like partial derivative formulas I'd seen in multivariate calculus (just a semester before), with some squares thrown in, but couldn't see a connection beyond that. Years would pass.

And then I trained to become a passable mathematician and statistician. Now I know my intuition about partial derivatives was correct, and the missing ingredient was a bit of probability theory. (Again: how probability theory doesn't get taught to students of the hard sciences in a formal manor is beyond me. We have to get past teaching statistics only in lab methods courses and start giving Kolmogorov, et al their due.)

Here's the setup^{3}. We have two quantities, call them \(X\) and \(Y\), that we measure. These might be the voltage and current in a circuit, where we're interested in computing the resistance^{4}. We'll treat these as random variables. Again, we are fallen beings in an imperfect world. And probably a bit hung over from that party the night before. As such, we need to specify things like means and variances for the random variables. Since we typically only measure something a single time in a chemistry lab, we'll take the mean values to be the measured values (why not?), and the variances to be our instrument capacities (the 0.5 mm in a 1-mm demarcated ruler). What we really want is to know the uncertainty (i.e. variance) in \(Z = g(X, Y)\), our observed quantity. Again, in the electrical example, we might compute \(R = g(V, I) = \frac{V}{I}\), and want to say something about how certain we are about the resistance.

How can we do this, using only our knowledge about \(g\) and the variances of \(X\) and \(Y\)? A Taylor expansion, of course! Recall that a smooth function of two variables looks more or less like a plane if we get close enough to its surface. Thus, we can write out a linear approximation (truncating the Taylor expansion at the linear term) of \(g\) *about the mean values* of X and Y as \[ Z = g(X,Y) \approx g(\mu_{X}, \mu_{Y}) + (X - \mu_{X})\frac{\partial g}{\partial X}(\mu_{X}) + (Y - \mu_{Y})\frac{\partial g}{\partial Y}(\mu_{Y}).\] We now have written \(Z\) as an affine transformation of \(X\) and \(Y\). And variances behave very nicely with respect to affine transformations. In particular, \[ \text{Var}\left(a + \sum_{j = 1}^{p} b_{j} X_{j}\right) = \sum_{j = 1}^{p} b_{j}^{2} \text{Var}(X_{j}) + 2 \sum_{i < j} b_{i} b_{j}\text{Cov}(X_{i}, X_{j}).\] (We could wrap this all up very nicely using linear algebra into a quadratic form. But that's likely to scare the chemistry majors. Heck, it scared *me* in my senior year mathematical statistics class.) Applying this to our linear approximation of \(Z = g(X, Y)\), we get that \[ \text{Var}(Z) \approx \left(\frac{\partial g}{\partial X}(\mu_{X})\right)^{2} \text{Var}(X) + \left(\frac{\partial g}{\partial Y}(\mu_{Y})\right)^{2} \text{Var}(Y) + 2 \text{Cov}(X, Y) \frac{\partial g}{\partial X}(\mu_{X}) \frac{\partial g}{\partial Y}(\mu_{Y}).\] Typically, we don't have any reason to believe that the errors in \(X\) and \(Y\) are correlated, say, if we've measured them with two different instruments^{5}, so the covariance term drops and our expansion is just in terms of the weighted variances (weighting by the squared partial derivatives) of \(X\) and \(Y\). From this point, it's easy enough to compute the propagation of errors formula for an arbitrary \(g\). Though of course we must always keep in mind that we've made two *approximations*. First, we've truncated the Taylor expansion of \(g\) at the linear terms. Second, we're ultimately substituting in our *measured* values of \(X\) and \(Y\) for \(\mu_{X}\) and \(\mu_{Y}\). Typically, these approximation errors are overwhelmed by stochastic errors (i.e. measurement noise). Besides, what else are we to do?

This is all related to something statistician's call the delta method. The delta method considers the limiting distribution of the transformation of a random variable whose own limiting distribution is known to be normal. The intuition is relatively straightforward: if \(X\) is normal, then \(a X + b\) is also normal (normal random variables live in the stable family of distributions). Thus, if we have a transformation \(g\) that is *almost* linear, then \(g(X) \approx a X + b \) for some \(a\) and \(b\), and thus \(g(X)\) should have an approximately normal distribution. Recall that our Taylor expansion is taken about the mean value of \(X\). Thus, if \(X\) spends most of it's time around \(\mu_{X}\), the approximation will be very good, and the normal approximation will good. Proving this result takes a decent amount of algebraic manipulations, and can be found in Larry Wasserman's *All of Statistics*.

The delta method comes in handy when dealing with maximum likelihood estimators for distributions that satisfy certain regularity (i.e. smoothness) assumptions. In these cases, we know that \(\hat{\theta}\) is asymptotically normal, and thus \(g\left(\hat{\theta}\right)\) will also be asymptotically normal (given a few assumptions on \(g\)). Like usual, the Taylor expansion has saved us a great deal of work, both in terms of theory and computation.

Looking over the laboratory textbook we used in Physical Chemistry, I am surprised by how much *more* probability theory and statistical theory is present than I remember. I suppose I didn't have ears to hear at the time (I thought I hated statistics), and didn't have eyes to see (i.e. I didn't have the requisite background for any of the material to make sense). But that said, this textbook doesn't even hint at the derivation of the propagation of errors formula. Apparently linear approximations are above and beyond what a standard undergraduate taking a physical chemistry course is expected to know. (Which is outrageous.)