A Menagerie of Power Laws

I read this post today over at God Plays Dice. Michael points to a post by Robert Krulwich (of Radiolab fame). In this post, Krulwich discusses work done by Geoffrey West on how power laws seem to show up in the relationship between metabolic rates and things like body mass.

Michael ends with a caveat that power laws are overreported in the scientific literature (in my opinion, especially in the 'complex systems' literature). But he then points to Shalizi's1 wonderful So You Think You Have a Power Law -- Well Isn't that Special?, which links to a paper by Shalizi, Aaron Clauset, and Mark Newman that demolishes a lot of the supposed power laws reported in large parts of the complex systems literature.

There's a problem with connecting these two things (the West paper and the Shalizi paper), however. I posted a comment on Michael's post, so I'll just quote a part of that comment, with fancier MathJax to make some of the equations look nicer:

Second, I'm going to be a bit pedantic (being an (applied) mathematician-and-complex-systems-scientist-in-training myself) about the difference between Shalizi, et al.'s power law paper and and the research Krulwich has linked to.

West's research is a question of regression: given that we observe a particular mass, what is the expected mortality rate? That is, we seek to estimate \(E[L | M = m]\), where \(L\) is the lifespan and \(M\) is the mass of the organism. In this case, trying to fit a regression function that happens to take the form of a power (i.e. something like \(m^{-\alpha}\)) is perfectly appropriate. The regression function \(E[L | M = m]\) may well be approximated by this. West's research indicates this parametric form for the regression function is at least a good candidate.

Shalizi and his coauthors are addressing a different question, something more like 'What is the distribution of outbound links for websites on the internet?' In this case, we seek to estimate the probability distribution (either the mass function or density function). We could do this parametrically by assuming that, say, the number of outbound links to a page should be power law distributed, and thus the mass function takes the form \[p(x) = C x^{-\alpha}\] where \(x\) is the number of outbound links and \(C\) is a normalization constant. What Shalizi, et al. are warning against is then fitting this model by plotting the empirical distribution function and tracing a line through it and then deriving an estimate of alpha from the slope of the fitted line. This is a silly approach that doesn't have any nice theoretical guarantees. We've known of a better way (maximum likelihood estimation) since at least the early 1900s, and maximum likelihood estimators do have provably nice properties.

The two problems (regression and parametric estimation) are clearly related, but they're asking different questions, and those questions admit different statistical tools. When I first heard about these problems, I conflated them as well. The fact that both uses of the monomial \(x^{-\alpha}\) are called 'power laws' certainly doesn't help. And why the name 'law' anyway? We usually don't invoke 'Gaussian laws' when we observe that certain things are typically normally distributed...

Of course, we also have models for why the power laws show up in the West-sense. Because if you can't tell a story, you're not doing science!


  1. Full disclaimer: Shalizi is one of my scientific / mathematical / statistical heroes, so I'll probably be linking to a lot of things by him. The fact that he showed up in Michael's post was an added bonus.

Biased Biases

During my introduction to psychology class at Ursinus College (because, well, I had to take something that semester), the professor covered a unit on cognitive biases. These are typically neat, and I dedicate a non-negligible portion of every week reading blogs on becoming less wrong1 and overcoming biases2.

One of the biases, however, really bugged me. It's called the 'representativeness bias.' Here's an example from an overview at About.com:

For an illustration of judgment by representativeness, consider an individual who has been described by a former neighbor as follows: "Steve is very shy and withdrawn, invariably helpful, but with little interested in people, or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail." How do people assess the probability that Steve in engaged in a particular occupation form a list of possibilities (for example, farmer, salesman, airline pilot, librarian, or physician)? ... In the representativeness heuristic, the probability that Steve is a librarian, for example, is assessed by the degree to which his is representative of, or similar to, the stereotype of a librarian.- Amos Tversky & Daniel Kahneman from Judgment Under Uncertainty: Heuristics and Biases

But where's the mistake here? Well, it amounts to the opposite of what known as the base rate fallacy. Basically, Tversky and Kahneman are answering an unconditional question, while the question posed seems to indicate we should answer using all of the evidence provided. For those versed in STAT100-level probability, that means the question asks \(P(L | E)\), where \(L\) is the event that Steve is a librarian and \(E\) is the event that the various characteristics (Steve is shy, etc.) are true. This is precisely what a person should do. They should incorporate the evidence that they observe when coming to a conclusion. But Tversky and Kahneman want the person to answer with \(P(L)\), the base-rate (unconditional) probability that Steve is a librarian.

I don't think using evidence should count as a bias.


  1. I should mention here that I really don't like Eliezer Yudkowsky's rampant pro-Bayesianism. I read this when I was an impressionable youth. Fortunately, a good course in probability theory dissuaded me of the misconception that there's anything magical about Bayes's Theorem: it's just an application of the definition of conditional probability and the Theorem of Total Probability. Bayesianism, on the other hand, is a topic for a different day.

  2. Fun bonus fact: If you search for Overcoming Bias in Google, you'll find that Google has mischosen the picture for Robin Hanson to be that of his son, here. At least, they did until I hit the 'Wrong' button on their site. Time will tell if they update their web crawler.

Randomness for Self-Improvement

A quick Google search of 'random reinforcement self help' and 'using randomness self help' returns nothing useful. Actually, the first set of terms returns this paper as its first hit. Something I'm interested in reading, and first discovered through Shalizi. Most of the other hits involve types of reenforcement schedules, including random ones.

But nothing in particular about actively designing a random reinforcement schedule. Which seems odd, since one of the best reinforcement schedules I've ever been subjected to (subject myself to?) has been Facebook's. The arrival of wall posts can surely be modeled as some sort of random process (that would be an interesting project, in fact, and would only require mining my inbox). And their arrivals have induced in me a behavior, common to most of Facebook users (I would imagine), of returning to Facebook several times a day. Interestingly, it doesn't seem to matter how infrequent the arrivals of these posts are; in the language of Poisson processes, the rate parameter \(\lambda\) seems to be inconsequential. Though I do find myself increasing my visits to the site immediately after I post something, obviously because that increases \(\lambda\) (as such, I suppose I should be talking about \(\lambda(t)\)) and thus my anticipation of a response.

I have a few ideas for behaviors that might be diminished by random reinforcement. But not many that would be improved by it. This isn't a new idea, either, as evidenced by this entry. But the idea might warrant further investigation.

And probably a more careful literature review than googling.

Digital Tsundoku

I've just collected all of the books I've ever bookmarked from Amazon (and have yet to purchase) into a single text file. There are 230 of them.

At the rate of six hours per book (a decent estimate at the average time I spend reading a book I manage to finish), that's 1380 hours of reading. Or 57 complete days.

A digital form of 'tsundoku', the Japenese word for 'buying books and not reading them; letting books pile up unread on shelves or floors or nightstands'.

Randomness for fun and profit

I now spend a part of Fridays going through my RSS reader (NetNewsWire), collecting a list of articles I would like to read. I then save all of those articles in a text file (called links.txt) and use a Python script, referenced via the bash alias lniks, to randomly choose a link and open it in Safari. Now (and this is relatively now), if the article is longer than a paragraph or so, I push it off to Instapaper, so that I can read it on my Nexus 7.

This is an example of actively incorporating randomness into my life. Previous to using the lniks script, I found that I wouldn't read a certain article for a very long time. Admittedly, this meant that I got through the articles that I most wanted to read first. But it left those other ones wasting away, never to be read.

Actually, I invented this system completely by accident. Before, I kept folders, by topic (MISC, SCI, TECH, META, etc.), that I stored the articles in. Once, I failed to save these folders, and had to reconstruct my weekly dose of articles through the 'History' feature in Safari. It turned out that when I saved the articles from 'History' into my 'Bookmarks' folder, Safari shuffled all of the articles around. "What a bother!" was my first thought. But as I started working my way through the articles, I found that my pace of consumption greatly increased, as I felt more comfortable deleting articles, if I didn't really have an interest in reading them.

So, for a while, I recreated this 'failure to save bookmarks' experience, until I realized that it was the randomness of the bookmarks' presentation that I really wanted to capture. Thus, lniks.py, and my new habit of weekly article consumption.

Testing footnotes with MultiMarkdown.

Here is a preface to a footnote1.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc sed arcu hendrerit libero rutrum egestas. Ut ut dolor a justo pretium porttitor. Aliquam pharetra lacinia elit, egestas scelerisque metus volutpat sit amet. Pellentesque magna sapien, egestas eget dapibus at, sollicitudin nec dolor. Donec orci turpis, posuere eget consequat sit amet, pellentesque nec turpis. Nullam mauris eros, consequat nec dapibus in, hendrerit lobortis enim. Pellentesque et enim sit amet nisi elementum malesuada. Integer eu orci erat. Integer eu semper libero. Fusce sed sapien at neque rutrum elementum. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Pellentesque auctor elit et sapien pharetra nec luctus eros semper. Duis commodo molestie felis, et pellentesque lacus sagittis rhoncus. Donec tempus tristique nibh, adipiscing rhoncus sapien dignissim non. Quisque sed neque leo. Sed sapien diam, fringilla et rutrum eget, aliquet vel velit. Donec sed augue justo, ac ultricies justo. Pellentesque in dui nibh. Aenean dictum rhoncus purus, ut lobortis elit vehicula at. Nulla et ligula lacus. Sed a scelerisque tellus. Suspendisse potenti. Nullam ac fermentum orci. In vestibulum orci vitae turpis auctor non viverra tellus gravida. Curabitur sodales cursus nunc et varius. In hac habitasse platea dictumst. Nullam lobortis ornare nisi in pharetra. Fusce eu iaculis quam. Aenean blandit semper nunc nec facilisis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin at libero sit amet lacus tincidunt ultrices sit amet sed enim. Pellentesque tristique sodales blandit. Etiam tortor sapien, cursus in accumsan at, accumsan ac dui. Pellentesque eu dui metus. Sed eu risus id risus pellentesque egestas vestibulum in sapien. In auctor luctus orci et fringilla. Duis feugiat venenatis felis sed posuere. Pellentesque vitae tortor arcu, ut euismod sapien. Suspendisse placerat leo leo, nec imperdiet diam. Maecenas ipsum dui, condimentum et elementum id, consectetur sit amet nisl. Etiam sit amet est sapien, vitae molestie orci. Aenean pellentesque rhoncus dapibus. Donec bibendum orci eu libero vulputate pellentesque. Pellentesque est ante, consequat in consectetur sed, tempus et mauris. Ut non ultrices nunc. Suspendisse commodo porttitor sodales. Suspendisse viverra, elit non posuere tincidunt, elit lectus ullamcorper tellus, in convallis sem urna vitae sapien. Quisque ultricies posuere odio, quis porttitor orci dapibus vel. Sed tincidunt consequat felis, eget convallis augue vehicula sit amet. Aliquam dignissim augue sit amet sapien scelerisque vel volutpat arcu molestie. Donec velit metus, rhoncus at porta at, molestie et nunc. Quisque semper egestas ullamcorper. Suspendisse dapibus dictum tellus id hendrerit. Nulla et lorem mi. Cras porta elit vitae lorem feugiat faucibus. Praesent massa sapien, facilisis quis gravida et, volutpat a velit. Nam eros odio, pretium ut convallis at, interdum ut eros. Quisque egestas massa sit amet nunc consectetur fringilla. Integer ultricies felis in elit venenatis interdum. Nullam elit nisl, imperdiet a gravida et, semper id ligula. Sed dui lectus, pulvinar sed sollicitudin at, fermentum eu quam. Duis a metus neque, in imperdiet felis. Aliquam ac nibh orci, hendrerit tincidunt tellus.


  1. And here is the footnote text itself. ↩

Reading wasserman

I intentionally didn't capitalize Wasserman's name in the title of this post. Why? Because I've been reading a lot of writing by him lately (see, for instance here, where he misspells 'hear' as 'here' at one point), and I have noticed that they have a lot of typos. I've also found several typos in his text All of Statistics (and even made it into his the acknowledgements for his errata of the second edition for noticing where he misspelled Mahalanobis).

My point isn't to critique his spelling. Actually, my point is the opposite of that. Wasserman is a brilliant statistician. And he has better things to do than proofread all of his blog posts / textbooks for trivial mistakes like these. He's too busy, you know, doing amazing statistics. That's a good indication of what I should be doing if I want to become even an slightly-above-average scientist. Push out more material. Write more. Formulate more ideas. Spelling mistakes be damned.

Which is the long winded explanation for why I didn't capitalize Wasserman's name in the title for this post.

All of that said, if you have any interest in mathematics, statistics, machine learning, or just generally cool stuff, you should check out Wasserman's blog at Normal Deviate. (Great pun!)

Let me do the math...

One of my least favorite phrases. But let's see if this works:

\[ f_{X}(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{1}{2 \sigma^2} (x - \mu)^2}\]

Yep. That's the probability density function for a normal random variable, \( X \sim N(\mu, \sigma^{2}) \). The most math-y thing I could think of at the moment. Which shows you were my head is at.

PS - The formulas in this post were generated using MathJax. Not quite as pretty as LaTeX, but it at least uses the same formatting.