A Big Problem That Doesn’t Need a Bignum

When I finally wrapped up the mathematical modeling of the k-consecutive coin toss problem I had a concise formula:

This is the equation for the probability of seeing k consecutive heads in n tosses of a coin.

To calculate the chances of k heads in a row appearing in n tosses, I just had to calculate a k-step Fibonacci number, and then divide it by an exponent of 2, and subtract the result from 1. Not complicated at all - but there was one catch. After a million coin tosses, the Fibonacci number and the exponent of 2 were approximately 300,000 digits long - way outside the scope of normal computations in languages like C++ and Java.

To get the exact result I needed, I used the BigInteger and BigDecimal Java classes to handle those big numbers, and had to wait about an hour to get the result. As it turns out, there is a much faster and easier way to do things

Simplifying the Computation

To solve this problem in a more reasonable matter, take a look at the part of the equation where all the real work is done:

This is the equation for the probablity of seeing k consecutive heads in n tosses of a coin, slightly modified. The modification is removing the start of the equation, '1 -', leaving the more difficult part.

In this equation, the p we are solving for is the probability that no runs of k consecutive heads will occur in n tosses of a fair coin. Although we might be able to find a closed-form equation that defines the k-step Fibonacci number, it is going to be a complicated polynomial, and the chances of us simplifying this equation are effectively nil.

So at first blush I seem to be stuck with calculating the top and bottom parts of the equation independently. But a little inspection of the equation shows that I can actually calculate p in an iterative fashion, starting with n=1, and working my way up to the desired result.

Calculating the Fibonacci Number

The first step in the calculation is writing the code to calculate k-step Fibonacci numbers. Recall that the k-step Fibonacci number n is found by summing the values of the previous k numbers - a recursive definition. The base cases for all values of k are: Fib_k(n) is 1 when n is 1, and 0 when n is less than 1.

To calculate the value of Fib_k in an iterative fashion, I make use of the the handy C++ container deque<T>. To calculate a k-step Fibonacci number, I make use of a deque<double> of size k that contains the previous k Fibonacci numbers. With the data structure set up like that, calculating the next Fibonacci number is simple:

  q.push_front( accumulate( q.begin(), q.end(), 0 ) );
  q.pop_back();

Note that after this calculation, q[ 0 ] contains the next new Fibonacci number, and the previous k-1 numbers are still stored in sequence in the deque. The call to pop_back() discards Fib_k(n-k), which is not going to be needed for the next calculation.

Add the code to initialize the deque, and a loop, and we can print out Fib_k(n) for values 1 through n:

deque<double> q(k, 0);
q[0] = 1.0;
for ( int i = 1 ; i < n ; i++ ) {
    cout << q[ 0 ] << " ";
    q.push_front( accumulate( q.begin(), q.end(), 0.0 ) );
    q.pop_back();
}
cout << q[ 0 ] << endl;

Using the familiar value of k=2, and n=20, I get the following output:

1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765

The deque is perfect for this class - it lets us simulate a rolling window through the Fibonacci sequence by pushing new values on one end and popping aged ones off the other end. All this while providing constant time access to indexed elements. (Admittedly this code could be written just as efficiently with a list<double>.)

Divide By Two

This is a nice simple calculation, but it only gets me halfway to the goal. I’m calculating the top part of the fraction that computes p, but I need to divide by the bottom value as well. As it turns out, I can accomplish this in a way that allows me to iterate to the next value: before I add the values of the previous Fibonacci numbers to q[0], I just divide all of them by two. The resulting loop ends up looking like this:

deque<double> q(k, 0);
q[0] = 1.0;
q[1] = 1.0;
for ( int i = 1 ; i < n ; i++ ) {
    q.push_front(0.0);
    for ( int j = 1 ; j <= k ; j++ ) {
        q[ j ] /= 2.0;
        q[ 0 ] += q[ j ];
    }
    q.pop_back();
}
cout << q[ 0 ] << endl;

Note that I’ve slightly changed the initialization - the deque<double> is initialized with Fib_k(2) - this is because the top half of the fraction uses Fib_k(n+2), and the bottom uses 2ⁿ - the value of n for the Fibonacci calculation always has to be two ahead of the exponent of 2.

This calculation allows me to calculate the probabilities for large numbers of tosses in a matter of seconds, instead of the hour-plus it took using BigInteger. Using this code I was able to get the final word on just when we could be certain that a certain number of coin tosses would produce a run of 20 heads. If we want to be modestly certain, with 90% probability I can say that 4,828,842 coin tosses will produce a run of 20 heads.

If my life depended on it, and I wanted to get out to five nines, I can say that 24,144,137 coin tosses will produce a run of 20 heads with 99.999% probability. The output of kfib.cpp is shown here:

There is a 90% probability of seeing 20 consecutive heads in 4828842 tosses of a fair coin
There is a 99% probability of seeing 20 consecutive heads in 9657666 tosses of a fair coin
There is a 99.9% probability of seeing 20 consecutive heads in 14486490 tosses of a fair coin
There is a 99.99% probability of seeing 20 consecutive heads in 19315313 tosses of a fair coin
There is a 99.999% probability of seeing 20 consecutive heads in 24144137 tosses of a fair coin

And that’s the last thing I am going to say about it.