When I finally wrapped up the mathematical modeling of the k-consecutive heads coin toss problem I had a concise formula:

To calculate the chances of k heads in a row appearing in n tosses, I just had to calculate a k-step Fibonacci number, and then divide it by an exponent of 2, and subtract the result from 1. Not complicated at all – but there was one catch. After a million coin tosses, the Fibonacci number and the exponent of 2 were approximately 300,000 digits long – way outside the scope of normal computations in languages like C++ and Java.
To get the exact result I needed, I used the BigInteger and BigDecimal Java classes to handle those big numbers, and had to wait about an hour to get the result. As it turns out, there is a much faster and easier way to do things
Simplifying the Computation
To solve this problem in a more reasonable matter, take a look at the part of the equation where all the real work is done:

In this equation, the p we are solving for is the probability that no runs of k consecutive heads will occur in n tosses of a fair coin. Although we might be able to find a closed-form equation that defines the k-step Fibonacci number, it is going to be a complicated polynomial, and the chances of us simplifying this equation are effectively nil.
So at first blush I seem to be stuck with calculating the top and bottom parts of the equation independently. But a little inspection of the equation shows that I can actually calculate p in an iterative fashion, starting with n=1, and working my way up to the desired result.
Calculating the Fibonacci Number
The first step in the calculation is writing the code to calculate k-step Fibonacci numbers. Recall that the k-step Fibonacci number n is found by summing the values of the previous k numbers – a recursive definition. The base cases for all values of k are: Fibk(n) is 1 when n is 1, and 0 when n is less than 1.
To calculate the value of Fibk in an iterative fashion, I make use of the the handy C++ container deque<T>. To calculate a k-step Fibonacci number, I make use of a deque<double> of size k that contains the previous k Fibonacci numbers. With the data structure set up like that, calculating the next Fibonacci number is simple:
q.push_front( accumulate( q.begin(), q.end(), 0 ) ); q.pop_back();
Note that after this calculation, q[ 0 ] contains the next new Fibonacci number, and the previous k-1 numbers are still stored in sequence in the deque. The call to pop_back() discards Fibk(n-k), which is not going to be needed for the next calculation.
Add the code to initialize the deque, and a loop, and we can print out Fibk(n) for values 1 through n:
deque<double> q(k, 0);
q[0] = 1.0;
for ( int i = 1 ; i < n ; i++ ) {
cout << q[ 0 ] << " ";
q.push_front( accumulate( q.begin(), q.end(), 0.0 ) );
q.pop_back();
}
cout << q[ 0 ] << endl;
Using the familiar value of k=2, and n=20, I get the following output:
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
The deque is perfect for this class - it lets us simulate a rolling window through the Fibonacci sequence by pushing new values on one end and popping aged ones off the other end. All this while providing constant time access to indexed elements. (Admittedly this code could be written just as efficiently with a list<double>.)
Divide By Two
This is a nice simple calculation, but it only gets me halfway to the goal. I'm calculating the top part of the fraction that computes p, but I need to divide by the bottom value as well. As it turns out, I can accomplish this in a way that allows me to iterate to the next value: before I add the values of the previous Fibonacci numbers to q[0], I just divide all of them by two. The resulting loop ends up looking like this:
deque<double> q(k, 0);
q[0] = 1.0;
q[1] = 1.0;
for ( int i = 1 ; i < n ; i++ ) {
q.push_front(0.0);
for ( int j = 1 ; j <= k ; j++ ) {
q[ j ] /= 2.0;
q[ 0 ] += q[ j ];
}
q.pop_back();
}
cout << q[ 0 ] << endl;
Note that I've slightly changed the initialization - the deque<double> is initialized with Fibk(2) - this is because the top half of the fraction uses Fibk(n+2), and the bottom uses 2n - the value of n for the Fibonacci calculation always has to be two ahead of the exponent of 2.
This calculation allows me to calculate the probabilities for large numbers of tosses in a matter of seconds, instead of the hour-plus it took using BigInteger. Using this code I was able to get the final word on just when we could be certain that a certain number of coin tosses would produce a run of 20 heads. If we want to be modestly certain, with 90% probability I can say that 4,828,842 coin tosses will produce a run of 20 heads.
If my life depended on it, and I wanted to get out to five nines, I can say that 24,144,137 coin tosses will produce a run of 20 heads with 99.999% probability. The output of kfib.cpp is shown here:
There is a 90% probability of seeing 20 consecutive heads in 4828842 tosses of a fair coin There is a 99% probability of seeing 20 consecutive heads in 9657666 tosses of a fair coin There is a 99.9% probability of seeing 20 consecutive heads in 14486490 tosses of a fair coin There is a 99.99% probability of seeing 20 consecutive heads in 19315313 tosses of a fair coin There is a 99.999% probability of seeing 20 consecutive heads in 24144137 tosses of a fair coin
And that's the last thing I am going to say about it.
5 users commented in " A Big Problem That Doesn’t Need a Bignum "
Follow-up comment rss or Leave a TrackbackDid you mention somewhere why you didn’t apply Markov Chains? Then this would be a very trivial problem, and also biased coins are easy to simulate.
Actually if double percision is enough, doesn’t the runtime depend only on the value k and not n? I’m not sure how matrix powers are implemented in for example LAPACK.
>Then this would be a very trivial problem,
>and also biased coins are easy to simulate.
I’m not sure what you’re getting at. The solution shown here is relatively simple. Using Markov chains you could derive a closed-form equation for this problem (I presume) but getting there would not be trivial. Point me to an example if you disagree.
>doesn’t the runtime depend only on the value k and not n?
I don’t see how the problem can be solved without having an O(n) component in the runtime, at least not using the equation that I am using. Again, if you had a closed form equation for the 20-step Fibonacci number, the dependence on n would evaporate, but getting there is beyond my means.
Hi, I kinda disagree. At least in Matlab everything seems to be very easy (use symbolic toolbox if you want to keep the rational numbers).
Fist initialize the transition matrix P, and make two vectors to pull out the relevant number (or you could store the answer to a different matrix and use indexes to access it):
I substract 0.9 to see the residual (which is close to zero) in a nice scientific notation: 1.2884e-008
If you want the formula for that, just calculate SVD of matrix P, raise the singular values to the symbolic power "a" and do the matrix multiplications symbolically. Without rounding the resulting answer is a bit long...
I noticed that this probability is very sensitive to the fairnes of the coin, for example if probability for heads is 51%, the probability for probability gettin 20 heads increases from 37.9254% to 50.0617%!
I like Markov Chains because in this approach I don't need to bother my brains too much, just describe the rules of the game. I hope I didn't do any mistakes.
As I lack anything but the most basic math skills, and don't have access to Matlab, your code is far from trivial for me.
Developing the exact formula for the problem was a pencil and paper affair, and when it was complete through I get a nice equation that is pretty easy to understand. And of course, it can be solved using simple programming.
So, I have to take your word that this Matlab code gets the right answer. Maybe someday I'll try to replicate in Maple so I have a better understanding, but for now I'll stick with the simple approach.
- Mark
Thank you for your article on probability of 20 heads in a row out of a million coin tosses.
If you tried to implement the Fibonacci step algorithm into a spreadsheet, it works fine up to 1000 coin tosses and up to 9 heads. Higher than that the intermediate terms start to approach 1.0E300.
---
However the inverse function can readily be seen (at least numerically). The inverse function says you must roughly double your number of coin tosses to get one additional number in the streak. This approximation is valid near 50%. If you want to know what the probability of 20 heads is out of a million tosses, then you need to examine the curve for 9 heads in a row crosses.
----
1e06/2^(20-9)=488.28
Inverting the curve
N=488 P=37.76%
N=489 P=37.82%
So my best guess for the probability is 37.8%
========
This post may be easier to understand if you look at the curves on the spreadsheet. I will send if you forward me an address
==================
If calculating the curve for 9 in a row is too difficult then you can give up a lot of fidelity by doing the curve for 2 in a row. With some diligence you can calculate the curve by hand.
Now 1e06/2^(20-2) =3.815, so you know the answer is between 37.5% and 50% (very rough approximation).
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]