Poisson's Urns
I have been discussing the second half of Poisson's researches. Let us now turn to the first, and to the celebrated law of large numbers. The connection is that 'r', the 'average' reliability of a juror. Poisson's mathematical problem was well understood for situations which could be modeled with each juror having the same reliability as every other. But this, as Ostrogradsky had observed, is preposterous. Some jurors are more reliable than others. Poisson's law of large numbers was devised in order to solve just this problem. He studied as a model the situation in which reliability varies from juror to juror, but in which there is some law (Poisson's word) or probability distribution (our phrase) for reliability among French jurors.
Abstractly, the situation was this. Jacques Bernoulli's famous theorem, published posthumously in 1713, applied to repeated drawings, with replacement, of balls from an urn with black and white balls. Let the proportion of black balls in the urn be p. We take this to be the probability of drawing a black ball. Draw a ball, note its colour, replace it and shake the contents. We can consider a sequence of many such draws, and the relative frequency with which black is drawn on such a sequence. We can ask, what is the probability that the relative frequency is within some 'error' e of p? Bernoulli could answer, and that answer became well known. But suppose one is considering a population of urns in which the proportion of black balls varies from urn to urn? We choose an urn at random, and then draw a ball at random. Once again, in repeated urn/ball draws, there will be a relative frequency with which black is drawn. Let q be the overall proportion of black balls in the urns. Can we make any statement about the probability that the relative frequency of drawing black, in urn/ball selections, is within some small 'error' of q? Yes. The precise statement is what Poisson called the law of large numbers.
— Ian Hacking, from The Taming of Chance, p. 102
Let's see if we can get close to the result alluded to by Hacking. As usual, I'll start by being overly formal. By the end, we'll see a surprising (to me, at least) result about Poisson's ball and urn(s) problem.
Suppose we have a population of urns that we select from at random. For simplicity, we'll assume that we have a continuum of urns, and that the proportion of black balls in each urn is distributed uniformly on \([0, 1]\). Once we've chosen an urn, we'll draw a single ball from it, note its color, and then place it back into the urn. Then we'll choose a different urn at random, and continue in this way until we've noted the color of \(K\) urns. Because we have a continuum of urns where the success probabilities are uniform on \([0, 1]\), the overall proportion of black balls in the urns is just the expected value of this uniform distribution, namely \(1/2\). We are then asking: if we randomly select urns, and then randomly select balls from those urns, will the frequency of black balls converge on \(1/2\)?
Let \(P_{k}\) be the success probability for a given urn. We have already said that \(P_{k} \sim U(0, 1)\). Once we've chosen an urn, the probability of drawing a black ball is fixed at \(P_{k} = p_{k}\), and thus the outcome of the draw \(B_{k}\) is Bernoulli(\(p_{k}\)). We can summarize these relationships by saying \[\begin{align*} P_{k} &\sim U(0, 1) \\ B_{k} | P_{k} = p_{k} &\sim \text{Bernoulli}(p_{k}). \end{align*}\] And now we ask \[ \frac{1}{K} \sum_{k = 1}^{K} B_{k} \xrightarrow{???} E[P] = 1/2.\]
The answer is yes. And the quickest way to get there (that I've thought of) is to determine the marginal distribution of the \(B_{k}\). That is, we know the marginal distribution of \(P_{k}\), and we know the conditional distribution of \(B_{k}\) given \(P_{k}\). So we can append those two together to get the joint distribution of \((B_{k}, P_{k})\) and then marginalize out the \(P_{k}\) to get the marginal distribution of \(B_{k}\). I was surprised by the result, though perhaps should not have been.
Performing a few computations, we see that \[ f(b, p) = f(p) f( b | p) = 1 \cdot p^{b} (1 - p)^{1-b}\] and thus \[ f(b) = \int_{0}^{1} 1 \cdot p^{b} (1 - p)^{1-b} \, dp.\] At this point, we could brush off our NIST Digital Library of Special Functions and look up the Beta function. Or we can take note that we only care about when \(b = 0\) (which gives us the probability of seeing a white ball) or \(b = 1\) (which gives us the probability of seeing a black ball). For example, substituting in \(b = 1\), we get \[ f(1) = \int_{0}^{1} 1 \cdot p \, dp,\] which is just the expected value of \(P\), namely 1/2. Thus, we see that the \(B_{k}\) are marginally Bernoulli(\(E[P]\)). Of course, after we've seen this trick with \(P \sim U(0, 1)\), we notice that we can pull the same trick for \(P\) distributed according to any well-behaved distribution. This is because we'll always have \[ f(1) = \int_{S(P)} f(p) \cdot p \, dp = E[P],\] where \(S(P)\) is the support of \(P\), namely \(S(P) \subset [0, 1]\), and in the original example, \(S(P) = [0, 1]\).
And this is the result that I found weird. We're choosing the success probabilities according to some distribution \(P\), and then choosing the balls based on this success probability. And yet marginally, if we just looked at the \(B_{k}\), they will look identical to if we just sampled the \(B_{k}\) according to a Bernoulli(\(E[P]\)) distribution. In other words, we might as well skip straight to sampling \(B_{k}\) instead of first sampling \(P_{k}\) and then sampling \(B_{k} | P_{k} = p_{k}\). It wasn't obvious to me that this should be the case. And yet that is what's going on. Even if we take \(P\) to be discrete. For example, if we have ten urns, each with a different proportion of black balls, and we choose each urn with equal (or for that matter, unequal) probability. On each draw, it will be precisely as if we've sampled from a single urn with the proportion of black balls given by the (possibly weighted) average over the proportions from all of the urns.
And now Poisson's result is straightforward to show. We can apply either the weak or strong law of large numbers directly to the sequence \(B_{1}, B_{2}, \ldots\) and we get what we want. Namely, \[ \frac{1}{K} \sum_{k = 1}^{K} B_{k} \xrightarrow[\text{or } a.s.]{P} E[P] = 1/2\]
So Poisson's problem really reduces to Bernoulli's problem, if we look at it in the right way.