The New York Times has an interesting article today examining the curious fact that certain types of terrorist organizations have an unusually high ratio of engineers among their members. An interesting point to study, no doubt, but what caught my eye was this little blunder:
William A. Wulf, a former president of the National Academy of Engineering, is, no surprise, no fan of the Gambetta-Hertog theory. “If you have a million coin flips,” he says, “it’s almost certain that somewhere in those coin flips there will be 20 heads in a row.”
This numerical gaffe is a prime example of innumeracy, a favorite topic of mine, and it is doubly bad. First, the New York Times with its old-school print-format hubris regarding fact checking should not have let this slip by unnoticed. Second, the fact that the speaker is not just an engineer, but president of our National Academy, adds insult to injury.
Probability 101
The Wikipedia says that Numeracy is the ability to reason with numbers and other mathematical concepts. In today’s world, it should be considered as important as literacy. So let’s try doing some thinking about this problem.
What should first catch your eye in this is the meaning behind “20 heads in a row.” As a programmer, you are instinctively aware that 2 to the 20th power is roughly one million. This means that the chances of flipping a true coin and having it land heads up 20 times in a row is inded roughly one in a million. Does this mean that flipping a coin a million times renders such a streak “almost certain?” Of course not.
If the chance of flipping a single head is one in two, and I flip a coin two times, am I almost certain to see one head? No. If the chances of two heads in a row is one in four, am I almost certain to see a streak of two if I flip four times? Still we intuitively answer no. It seems likely, but nowhere near a certainty. So the task in front of us is to scale this equation up and see if it changes in character as we near one million.
Pinning it Down
Determining how likely this streak is requires a frequent ruse we employ in probability. Instead of calculating the probability directly, we determine out how likely it is not to occur, then subtract that value from one.
We know that the chance of the coin flip happening in the first 20 flips is 1/2^20. We’ll call this number p. Now let’s imagine a sequence of a million coin flips. The chance of a streak of 20 heads not starting at position one is 1-p. The chance of it not happening in the sequence of coins starting at position 2 is likewise 1-p. The same probability is true for every sequence of flips from position 1 to position 999,981, the last possible start of a streak of twenty.
The chances of not seeing a coin flip in every one of those positions is found by multiplying each of their values, leading to the rather unwieldy formula (1-p)^999,981. Unwieldy, perhaps, but your scientific calculator will quickly tell you it resolves to roughly 0.39. So the chances of seeing 20 heads in a row after a million coin flips is more or less 61%. Hardly “almost certain”.
Finding Almost Certain
I’d like to think that “almost certain” is somewhere in the neighborhood of 99%. I’ll leave the calculation as an exercise for the reader, but if your calculator has a log button you will be able to determine that you will need almost five million coin tosses to achieve near certainty. And when you think about it (using your beloved numeracy) that number seems a lot more realistic. Something that has a one in a million chance of occuring would seem to be only somewhat likely to occur in a million tries. Give me five million and it’s a sure thing.
Ironically, the Gray Lady just ran an ode to fact checking a few weeks ago. Apparently that department is short on people with any sort of mathematical fluency. Perhaps they should think about hiring an engineer or two?
11 users commented in " Innumeracy Revisited "
Follow-up comment rss or Leave a TrackbackA very useful shortcut for this kind of problem is to remember that ((p-1)/p)^p converges pretty quickly to 1/e (~= 0.37).
Especially if you round that to 2/5, you can estimate solutions to many, many “large number” probability problems.
@David:
I was fooling around with the numbers in a spreadsheet and I saw the rapid convergence you speak of. I wasn’t familiar with that identity, however, thanks!
- Mark
Your reasoning is not correct. Consider the simple case of at least two heads in a row when flipping 4 coins, which you use in the “Probability 101″ paragraph to debunk the informal argument that 20 heads in a row is “almost certain”. By listing the 16 possibilities and
counting which ones have 2 or more heads in a row, the correct probability can be seen to be 1/2. Your reasoning gives an incorrect answer of 1 – (3/4)^3.
The flaw in the reasoning is the assumption that the probability of coins 2 through n+1 being all heads is independent of whether coins 1 through n are all heads. You are implicitly making this independence assumption by multiplying the probabilities together.
Now, in the case of 20 in a row out of a million, your analysis may be approximately correct, because the events you are multiplying may be close to independent, but I’m not even sure of that. I certainly hope it’s close, now that you’re on record in the New York Times with this result ;-)
I think the correct math is actually relatively complex. See http://wizardofodds.com/askthewizard/images/streaks.pdf, who shows results of a discrete time Markov chain analysis of a similar problem.
@Andy:
You are correct about my algorithm being off. This is a good news/bad news story.
The good news (for me) is that my system actually overestimates the probability of the sequence. The main point I was making was that the NY Times should have sensed right away that the figure was way off base – and that is still the case.
The bad news is that my simple formula is no good – although it does provide an upper bound for the probability, it is not correct. Probability usually boils down to being able to count things, and many mistakes in probability problems come about when you either count something twice or miss it altogether. My mistake was counting twice, and I’ll be working out that in another post!
The really bad news is that I should have tested my algorithm using a simple case, as you did, in which case I would have seen the error right away.
But truthfully, whenever I have a post about innumeracy, I’m happy when people find mistakes. It helps me clear up my thinking and ideally raises the level of numeracy all around.
- Mark
Mark, after my original post, I threw together a quick random simulation and ran a few thousand runs of a million coin flips (aren’t computers today amazing!), and got a probability of about 38%, which, as you say, is even lower than the 60% you got, so your qualitative message still stands.
As an aside, I think your response to my correction should be trumpeted as a model of how to graciously accept corrections to errors in internet posts! I have to confess that my first draft of my post to you was not nearly as admirable, and I’m relieved that I edited it before posting.
Let’s hear it for raising the level of numeracy all around!
Full explanation of this is going to have to wait for another post, but here is the synopsis:
My assumption was that p, the probability of a 20 head streak was 1/2^20 at every position in the imaginary million tosses. This was incorrect. This value of p is only true for the toss starting at position 1.
For positions 2 through 999,981, the value of p declines based on a complicated function. The good news is that the value of p quickly converges to a value somewhere around 1/2097131.
Plugging those numbers in mean that the chances of seeing the streak of 20 heads in a million tosses is more like 38% – considerably less than my previous solution of 62%.
Gory details to follow.
- Mark
In Python:
agrees with 37.9% prob for 20 heads in a million flips.
You may be interested in Project Euler problem 316 which relates to this problem.
@wrongrook:
Thanks - this helps validate the number I got with my quick Java Bignum program.
I actually have a nice formula for the solution, along with a good explanation of how one arrives at it. However, the formula for the solution incorporates a 20-step Fibonacci number.
Mathematics says that the recurrence that defines the 20-step Fibonacci number has a closed-form solution, which I could then use to give the exact result. But I don't have the mental or computational horsepower to produce that solution.
- Mark
[...] couple of weeks back I took issue with a statement in the New York Times Magazine: If you have a million coin flips, it’s [...]
On December 30th 2010 user wrongrook published a very elegant python program to calculate the probability of a k heads run in n coin tosses. Unfortunately he did not offer any explanation or reference. I would be very grateful if provided us some comments or a reference to the algorithm used.
@E.A.Neonakis:
I believe that I can give you the most exhaustive treatement you will ever find in these two articles, which were followups to this one:
20 Heads In a Row – What Are the Odds?
A Big Problem That Doesn’t Need a Bignum
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]