The Darth Vader Rule — Or, Computing Expectations Using Survival Functions
Actuaries are a smart lot. Their job is to take results from probability theory and statistics, and apply them to risk and uncertainty in the real world. In fiction, they form a secret society with the duty is to prophecy the future. They sometimes get a bad rap. But with all of the exams they have to take, there is no doubt that they know a thing or two about probability computations.
Which is why it shouldn't come as a surprise that it was within the actuarial lore that a found an answer to a simple question that has been bothering me over the past two days: how can you compute the expectation of a positive random variable using only its cumulative distribution function? Of course, we all know we could compute the expectation as \[ E[X] = \int_{0}^{\infty} x \, d F_{X}(x),\] which almost gets at what I'm looking for. But the claim I came across, first here, was that an alternative expression for the expectation is \[ E[X] = \int_{0}^{\infty} (1 - F_{X}(x)) \, dx. \] The source, and anywhere else I looked on the internet, only offered the advice to 'integrate the first expression by parts,' which I tried with no avail for two or three hours1.
After proposing the derivation to some (mathematician) friends and waiting a few days, I had the inspiration to search for 'computing expectations using survival functions', since, in the jargon of the actuarial sciences, the complementary cumulative distribution function \(S(x) \equiv 1 - F_{X}(x)\) is called the survival function2, and we are, after all, computing an expectation using this function.
Lo and behold, this brought me to a formal proof of the result I needed, which the authors Muldowney, Ostaszewski, and Wojdowski puckishly call the Darth Vader Rule. For posterities sake, and perhaps to make this results slightly easier to find on the internet, here's the basic derivation3.
Suppose we have a continuous random variable \(X\) whose range / support is \([0, \infty)\). Assume that the expectation of this random variable exists. We begin with the usual definition of expectation, \[ E[X] = \int_{0}^{\infty} x \, d F_{X}(x),\] and then we integrate by parts. We'll use the usual4 integration by parts formula, \[ \int u \, dv = uv - \int v \, du.\] We'll take \(u\) to be \(x\), and thus we get that \(du = dx\). It's the \(dv\) term that turns out to really matter. We'll take \(dv = f_{X}(x) \, dx\). Now's where the tricky part comes in. We need the antiderivative of \(f_{X}(x)\). The natural thing is to assume that this is precisely the cumulative distribution function \(F_{X}(x)\). Of course, this is almost right, since \(F_{X}(x) = \int_{-\infty}^{x} f_{X}(t) \, dt = F(x) - F(-\infty),\) where \(F\) is an antiderivative of \(f_{X}(x)\). But of course5, antiderivatives are only defined up to a constant, so we can take \(v = F_{X}(x) - 1 = -(1 - F_{X}(x))\), and substituting into our integration by parts formula, we find \[ \int_{0}^{\infty} x \, f_{X}(x) \, dx = - x (1 - F_{X}(x))\big|_{x = 0}^{\infty} + \int_{0}^{\infty} (1 - F_{X}(x)) \, dx,\] which is almost what we want, except for that pesky '\(uv\)' term. It's clear that evaluating at \(x = 0\) gives us \(0\). But what about \(x = \infty\)? The \(x\) term will grow unboundedly, and \(1 - F_{X}(x)\) will approach 0, so we're in a case where the limit is \(0 \cdot \infty\). As is, we can't say anything about this limit, but we can hope that \(1 - F_{X}(x)\) decays to zero faster than \(x\) grows to infinity. We might try L'Hopital's rule here. This is the moment where I would allow my students to storm ahead, so that they might know the frustration of a good idea not working out.
Actually proving that \[ \lim_{x \to \infty} x (1 - F_{X}(x)) = 0\] requires a bit of analytic trickery. Since I didn't come up with the trick, I urge you to see the second page of Muldowney, Ostaszewski, and Wojdowski's paper. With this result in hand, we've completed the derivation and found that, indeed, \[ E[X] = \int_{0}^{\infty} (1 - F_{X}(x)) \, dx = \int_{0}^{\infty} S(x) \, dx.\] A non-obvious result. But a nice one.
Also for posterity, there is a discrete analog of this result. Suppose that \(X\) is a positive, discrete-valued random variable whose range is the non-negative integers. In this case, the expectation of \(X\) is given by \[ E[X] = \sum_{n = 0}^{\infty} n P(X = n) = \sum_{n = 0}^{\infty} P(X > n) = \sum_{n = 0}^{\infty} (1 - P(X \leq n)),\] which is precisely the analog to the result derived above.
I will leave the proof of this result as an exercise for the reader. I'm fairly certain it's given as an exercise in Grimmett and Stirzaker's excellent Probability and Random Processes.
Does this indicate I should get a life? Most likely.↩
For perhaps obvious reasons. Namely, if we let \(T\) be the time for some object to fail, then \(S(t)\) is the proportion of objects in a population that would still survive at time \(t\).↩
Most of the sources I found left the derivation at "and now integrate by parts," which, as is usually the case in mathematics, sweeps a lot of the hard work under the rug.↩
Or at least 'usual' for those who learned calculus from Stewart's Early Transcendentals. I don't know how common this notation is outside of that text, but I imagine relatively.↩
'Of course,' this wasn't obvious to me, or I wouldn't have spent several hours trying to work out esoteric properties of cumulative distribution functions. But as a good mathematician, I have to pretend that all of these things come easily and naturally to me.↩