Estimating the Power Spectral Density: Part I, The Periodogram, and Why Not to Use It

If you give a physicist, electrical engineer, or applied harmonic analyst a time series (signal), one of the first things they'll do is compute its Fourier transform¹. This is a reasonable enough thing to do: the Fourier transform gives you some idea of the dominant 'frequencies' in the time series, and might give you important information about the large scale structure of the signal. For example, if you compute the Fourier transform of a pure sinusoid, and plot its squared modulus (power spectrum), you'll see a delta function at the frequency of the sinusoid. Similarly, if you compute the power spectrum of the sum of sinusoids, you'll get out a mixture of delta functions, with a delta function at each frequency. This is useful, especially if you signal really is just the sum of sines and cosines: you've reduced the information you have to keep (the entire signal) to a few numbers (the frequencies of the sinusoids).

But sometimes, you don't have any good reason to believe that the time series you are looking at can be decomposed (nicely) into sines and cosines. In fact, the thing you're looking at may be a realization from some stochastic experiment, in which case you know that nice representations of that realization may be hard to come by. In these cases, we can treat the time series as a realization from some stochastic process \(\{ X_{t}\} \)sampled at \(T\) time points, \(t = 1, 2, \ldots, T\). That is, we observe \(X_{1}, X_{2}, \ldots, X_{T}\) and want to say something meaningful about it. It's still quite natural to take the Fourier transform of the time series and square it with the hope that some structure pops out. Since we have the signal sampled at discrete collection of points, what we're really going to compute is the discrete Fourier transform² of the data, \[ d(\omega_{j}) = n^{-1/2} \sum_{t = 1}^{T} X_{t} e^{-2 \pi \omega_{j} t},\] where \(\omega_{j} = \frac{j}{T}\), and \(j = 0, 1, 2, \ldots, T - 1\). The discrete Fourier transform is a means to an end, namely the periodogram of the data, \[ I(\omega_{j}) = |d(\omega_{j})|^{2},\] which is the squared modulus of the discrete Fourier transform. The periodogram is the discrete analog to the power spectrum, in the same way that the discrete Fourier transform is the discrete analog to the Fourier transform.

If we were observing a purely deterministic signal, then we'd be done. Ideally, as we collect more and more data from a signal that isn't changing over time, we'll be able to resolve our estimate of the spectral density better and better.

If the signal is stochastic in nature, things are more complicated. It turns out that the periodogram is an unbiased but inconsistent estimator for the spectral density of a stochastic process. In particular³, one can show that asymptotically, at each Fourier frequency \(\omega_{j}\), while \[E[I(\omega_{j})] = f(\omega_{j}),\] that is, the estimator is unbiased, it also holds that \[ \text{Var}(I(\omega_{j})) = (f(\omega_{j}))^{2} + O(1/n).\] That is, the variance of the estimator approaches a (possibly non-zero) constant, depending on the true value of the spectral density⁴. In other words, even as you collect more and more data, the variance of your estimator for \(f\) at any point will not go down! The estimator remains 'noisy' even in the limit of infinite data!

The way to solve this is to 'share' information at the various \(\omega_{j}\) about \(f(\omega_{j})\) by smoothing the \(I(\omega_{j})\) values. That is, we perform a regression of \(I(\omega_{j})\) on \(\omega_{j}\) to get a new estimator \(J(\omega_{j})\) that will no longer be unbiased (since \(f(\omega_{j})\) is generally not equal to \(f(\omega_{j} \pm \Delta \omega)\)), but can be made to have vanishing variance (since as we collect more and more data, we can 'borrow' from more and more values near \(\omega_{j}\) to estimate \(f(\omega_{j})\) using \(I(\omega_{j}))\). This is the usual bias-variance tradeoff that seems to occur everywhere in statistics: we sacrifice unbiasedness to get reduction in variance in hopes that overall we reduce the mean-squared error of our estimator.

Since this is a regression problem (again, regressing \(I(\omega_{j})\) on \(\omega_{j}\)), we can solve it using any of favorite regression techniques⁵. As I said in the footnote below, Chapter 7 of Nonlinear Time Series by Fan and Yao has a nice survey of various standard (and some more modern) techniques for doing this regression / smoothing. One favorite method, especially for physicists who are hungry to find 1/f noise everywhere, is to fit a line through the periodogram. This is a form of regression / smoothing, but it seems to put the cart before the horse: if you go searching for a line, you'll find it. Then again, there are worse things you could do with a linear regression.

I've added 'inconsistency of the periodogram' to my list of data analysis mistakes not to make in the future. Lesson learned: always smooth your squared DFT!

Postscript: There are other ways to think about why we should smooth the periodogram that ring more true to experimentalists⁶. For some reason, I always find these stories (which ring more true with reality) less appealing than thinking about the problem statistically using a 'generative model' for the signal from the start.

Interestingly, if you give a physicist who specializes in nonlinear dynamics a time series, one of the first things they'll do is compute an embedding.↩
With any Fourier transform-type work, you always have to be careful about the \(2 \pi\) terms that show up here, there, and everywhere. I'm following the convention used in Shumway and Stoffer's Time Series Analysis and Its Applications.↩
See, for example, Section 2.4 of Nonlinear Time Series by Fan and Yao, where they give an excellent theoretical treatment of the problems using the raw periodogram, and Chapter 7 of the same text, where they offer up some very nice solutions.↩
This is a point I'm less clear on: if you have an asymptotically unbiased estimator, and its variance goes to zero, then the estimator is consistent (it converges in probability to its true value). You can show this relatively easily using Chebyshev's inequality. But I don't see any reason why the inverse should also be true: why is it that non-zero variance is enough to show that an estimator isn't consistent? Everything I've seen with regards to the non-zero asymptotic variance of the periodogram and its inconsistency states this fact without any more clarification. It must be something particular, or else the statement about asymptotically zero variance and consistency would be an 'if and only if' statement.↩
Though it should be noted (again, see Fan and Yao) that this is a very special kind of regression problem, since it is heteroskedastic (the variance of \(I(\omega_{j})\) depends on \(\omega_{j}\)), and the residuals are non-normal (in fact, they're asymptotically exponential). Thus, doing ordinary least-squares with a squared error objective function will be inefficient compared to taking these facts into account. Fan and Yao have a nice paper on how to do so.↩
Another view on why we should smooth, as pointed out in the Wikipedia entry on the periodogram: 'Smoothing is performed to reduce the effect of measurement noise.' Of course, 'measurement noise' turns out signal into a stochastic process as well. This is an interesting way to think about the problem, and probably the view taken by most applied scientists when they decide to smooth the periodogram. As a wannabe statistician, I prefer to think of the signal as a stochastic process from the outset, and realize that, even without 'measurement noise' (i.e. when all of the 'noise' is in the driving term of the dynamical equations), we still need to smooth in order to get out a reasonable guess at the power spectral density of the process. Collecting more data is not enough!↩