# Of Camping and Information Lost

Suppose you have a lock with three slots on it, and each slot can take one of the digits 0 through 9. After opening the lock, you'll want to thwart any evil doers by somehow randomizing the digits. (Otherwise, why have the lock?) The best strategy is to scramble each of the digits. But suppose we take a lazier strategy and choose one of the digits at random and scramble it. How much information has our hypothetical safe cracker lost about the actual lock combination by this randomization process?

This problem (after some mathematization) falls squarely in the realm of information theory. We can rephrase it as follows: if we have a lock combination \(X\) drawn uniformly from \(\{ 0, 1, \ldots, 9\}^{3}\), and then we generate \(Y\) by the scrambling process above, how much information have we lost about \(X\) once we know \(Y\)? Recalling that the mutual information, the amount of information we *gain* about \(X\) by knowing \(Y\) (or vice versa) is

\[ I[X; Y] = H[X] - H[X | Y], \]

we see that the quantity we're interested in computing is really \(H[X | Y]\), the information we've 'lost.'

Let's start by simplifying things a bit to sure up our intuition. Suppose that instead of a decimal-based lock, we have a binary lock with three bits. We'll take the initial lock combination to be uniform on \(\{0, 1\}^{3}\). Now suppose we use a simple procedure to randomize our lock. Instead of choosing one of the bits at random, we'll always randomize the first bit. Intuition tells us that the randomization process should lead to the loss of one bit of information about the true combination \(X\). How can we check that our intuition is guiding us correctly?

To compute \(H[X | Y]\), we need the conditional mass function \(p(x | y)\), the probability the original lock combination was \(x\) given that we observed the scrambled combination \(y\). What we know going into the problem are \(p(x)\), since we've assumed the original lock was chosen completely randomly^{1}, and \(p(y | x)\), since we know that the scrambled combination will take the values \(0x_{1}x_{2}\) and \(1x_{1}x_{2}\) with equal probability, where \(x_{0}x_{1}x_{2}\) is the original lock combination. We get \(p(x|y)\) from these in the usual way, by noting that \[ \begin{align}
p(x | y) &= \frac{p(y | x) p(x) }{p(y)} \\
&= \frac{\frac{1}{2} \cdot \frac{1}{8}}{\frac{1}{8}} = \frac{1}{2}
\end{align}\] for all \((x, y)\) that 'make sense' (i.e. those where \(x\) could be the unscrambled version of \(y\)), and zero otherwise. We already saw that \(p(x | y)\) will be \(\frac{1}{2}\) for such \((x, y)\), and since \(x\) is uniform on \(\{0, 1\}^{3}\), each of the 8 possible combinations has equal probability, namely \(\frac{1}{8}\). Moreover, since \(X\) is uniform, given our scrambling procedure it follows that \(Y\) will also be uniform.

We have everything we need (again, \(p(x | y)\)) to compute \(H[X | Y]\). We can visualize \(p(x | y)\) as a stochastic matrix \(\mathbf{W}\), with each row corresponding to a particular value of \(y\) and each column corresponding to a particular value of \(x\) and each entry corresponding to \(p(x | y)\). For each \(y\) value, there are two possible \(x\) values, and there are 8 possible \(y\) values. Thus, \(\mathbf{W}\) will have 16 non-empty entries, each with probability equal to \(\frac{1}{2}\). Thus, we can easily compute \[\begin{align} H[X | Y] &= -\sum_{(x, y) \in \{0, 1\}^{3} \times \{ 0, 1\}^{3}} p(x,y) \log_{2} p(x | y) \\ &= -16\cdot\frac{1}{16} \cdot \log_{2} \frac{1}{2} \\ &= 1, \end{align}\] which gives us 1 bit, as we expected.

Now to the case of interest: where we randomize which bit we scramble. Intuitively, we imagine that this should lead to more information lost than in the case where we always scramble the same bit. Thus, the information lost should certainly be greater than 1 bit.

And now I've lost steam. The procedure for finding \(H[X | Y]\) in this case is exactly the same. We can already specify \(p(x)\) and \(p(y | x)\), though \(p(y | x)\) will be a bit more complicated. In particular, \(Y\) will no longer be uniform on \(\{ 0, 1\}^{3}\), so we'll have to compute \(p(y)\) by marginalizing down from \(p(x, y) = p(x) p(y | x)\). We also have to be a little bit more careful about the number of non-zero entries in our new \(\mathbf{W}\). But once we do what amounts to bookkeeping, we'll find that for this randomization procedure, \[ H[X | Y] = 1.7925\ldots \text{ bits},\] which is an improvement over 1 bit, but not as good as 3 bits (which we would get if we completely randomized the lock each time). We have bought ourselves 0.7925 more bits of security by randomizing which bit we scramble.

Knowing people, the original combination probably wasn't chosen completely randomly. For instance, in the decimal lock case, I imagine people are more hesitant to use combinations with repeated digits like '111' or '222.'↩