Paley-Zygmund and Inconsistent Estimators

Cosma Shalizi was kind enough to share a proof that non-vanishing variances implies inconsistent estimators, related to my previous post on spectral density estimators¹. The proof is quite nice, and like many results in mathematical statistics, involves an inequality. Since a cursory Google search doesn't reveal this result, I post it here for future Googlers.

We start by noting the Paley-Zygmund inequality, a 'second moment' counterpart to Markov's inequality:

Theorem: (Paley-Zygmund) Let \(X\) be a non-negative random variable with finite mean \(m = E[X]\) and variance \(v = \text{Var}(X)\). Then for any \(a \in (0, 1)\), we have that

\[P(X \geq a m) \geq (1 - a)^{2} \frac{m^{2}}{m^{2} + v}.\]

We're interested in showing: if \(\text{Var}\left(\hat{\theta}\right)\) is non-vanishing, then \(\hat{\theta}\) is not a consistent estimator of \(\theta\). Consistency² of an estimator is another name for convergence in probability. To show convergence in probability, we'd need to show that for all \(\epsilon > 0\), \[ P\left(\left|\hat{\theta}_{n} - \theta\right| > \epsilon\right) \xrightarrow{n \to \infty} 0.\] Thus, to show that the estimator isn't consistent, it is sufficient to find a single \(\epsilon > 0\) such that we don't converge to 0. To do so, take \(X = (\hat{\theta}_{n} - \theta)^{2}\). This is clearly greater than or equal to zero, so we can apply the Zygmund-Paley inequality. Substituting into the inequality, we see \[P\left(\left(\hat{\theta}_{n} - \theta\right)^{2} \geq a m\right) \geq (1 - a)^{2} \frac{m^{2}}{m^{2} + v}\] where \[m = E\left[\left(\hat{\theta}_{n} - \theta\right)^{2}\right] = \text{MSE}\left(\hat{\theta}_{n}\right) = \text{Var}\left(\hat{\theta}_{n}\right) + \left\{ \text{Bias}\left(\hat{\theta}_{n}\right)\right\}^{2}\] and \[ v = \text{Var}\left(\left(\hat{\theta}_{n} - \theta\right)^{2}\right).\] If we assume the estimator is asymptotically unbiased, then \(m\) reduces to the variance of \(\hat{\theta}_{n}\). Next, we choose a particular value for \(a\), say \(a = 1/2\). Substituting, \[P\left(\left(\hat{\theta}_{n} - \theta\right)^{2} \geq \frac{1}{2} m\right) \geq \frac{1}{4} \frac{m^{2}}{m^{2} + v}.\] Finally, note that \[ \left\{ \omega : \left(\hat{\theta}_{n} - \theta\right)^{2} \geq \frac{1}{2} m\right\} = \left\{ \omega : \left|\hat{\theta}_{n} - \theta\right| \geq \sqrt{\frac{1}{2} m}\right\}.\]

Let \(m'\) be the limiting (non-zero) variance of \(\hat{\theta}\). Then for any \(\epsilon_{m'}\) we choose such that \[ \epsilon_{m'} \leq \sqrt{\frac{1}{2} m'},\] we see that \[ P\left(\left|\hat{\theta}_{n} - \theta\right| > \epsilon_{m'}\right) \geq P\left(\left(\hat{\theta}_{n} - \theta\right)^{2} \geq \frac{1}{2} m\right) \geq \frac{1}{4} \frac{m^{2}}{m^{2} + v} \geq 0\] and thus \[ P\left(\left|\hat{\theta}_{n} - \theta\right| > \epsilon_{m'}\right) \not\to 0. \] Thus, we have a found an \(\epsilon > 0\) such that the probability does not converge to 0, and \(\hat{\theta}_{n}\) must be inconsistent.

Which, admittedly, was posted ages ago. I promise I'll post more frequently.↩
Well, weak consistency. Strong consistency is about convergence to the correct answer almost surely. There is also mean-square consistency, which is a form of convergence in quadratic mean. Have I mentioned that good statistics requires a great deal of analysis?↩