If I had been drawing a paycheck for every hour I spent working on Project Euler’s problem 328, I think my summer vacation would already be paid for. But instead, after a long ten days or so of distraction, I’ll have to settle for the satisfaction of being number 38 or 39 to solve it.
In cause you haven’t visited Project Euler, it is a site dedicated to “challenging mathematical/computer programming problems”. A prototypical Project Euler challenge is one with a simple definition that is easy to solve for simple cases, but requires some ingenuity to scale up to the requirements given in the problem.
You're Getting Warmer
Problem 328 is a classic example of the genre. The basic setup is that of a number guessing game. Knowing that a number is some integer between 1 and N, your job is to make successive guesses until you get the answer. Each guess is answered with one of three conditions: low, high, or a match.
The twist in this problem is that your job is not to minimize the number of guesses you have to make, but rather, to minimize the sum of the guesses. For any range 1 to N, you have to select a path that minimizes the worst case cost.
As an example, if I was going to guess a number between 1 and 10, the best of the worst case strategies yields a value of 16. I get this score if the hidden number is either 8, 9, or 10. My first guess is 7, and in the worst case, my second guess of 9 nails it down to either 8, 9, or 10. Of course if the number is 6 or less I’ll get a lower score. Change the first guess to some other number, and you will always have a case which results in a score of greater than 16.
Figure 1 shows what the choice graph looks like when choosing a number between 1 and 20. The first number in each node is the guess. When a second number is present, it represents the accumulated cost at that point, working up from the leaf nodes. The choice at the top of the graph, 13, shows the cost for that problem: 49.
Choosing a number between 1 and 20
The Naive Approach
Solving this problem in small cases is nice and easy. Using a recursive formulation you can implement it in a single screenful of code. My test implementation in C++ is shown here - it calculates both the optimal first choice and the cost for a given range. By calling itself recursively, the problem solution is tidy and compact:
Like most recursive routines, it bails out early with one of three base cases which have trivial solutions. For all non-trivial solutions, the routine simply iterates through all possible guesses, calculating the cost of that choice and using recursion to calculate the cost of the two subproblems it creates.
Although this algorithm is simple and has a certain elegance, it has one big problem. A little examination will show that the runtime of this routine is asymptotically proportional to kN. Running on my desktop Linux system I was able to calculate best choices pretty quickly when N was under 30, but after that the runtime started ramping up drastically.
Getting There From Here
Since the solution to this algorithm requires calculating the best choice for numbers up to 200,000, there is no way that an O(kN) algorithm was going to fly. And that, of course, is the essence of a good Project Euler problem. Developing a solution for the simple cases is just the start.
After realizing that the naive solution won’t do it, you have to start looking at the problem from all angles. Can some optimization reduce it to a tractable polynomial problem? Or do you need a completely different approach. Perhaps the problem has a closed form solution that just requires pumping some numbers into an equation?
Eventually I was able to develop a solution that calculated all 200,000 value in less than a second. And while that would make an interesting post all on its own, it would be the epitome of bad form to spill the beans on an Euler Project solution.
Without giving away the secrets, however, I can tell you what was the most important factor for me in nailing down this problem: visualization.
To try to make some sense out of these paths through the choice tree, I turned to an old friend: Graphviz. This open source package makes visualization of data structures like binary trees a piece of cake.
Figure 1 is a simple graph created with graphviz. To really see the value of this package, examine the choice tree for N=100 in PDF format or SVG format, if your browser supports it. I spent a long inspecting these images, including some that had hundreds of nodes.
For this program, I didn’t even link to the graphviz library - I just created text files in the
correct format, then made a
system() call to the dot compiler program, which
creates the graphics files.
For this particular problem, graphviz is what guided me to my solution, and I don’t know of any other package, free or commercial, that could have done as well. It was the perfect tool for the job.
Now that I have put Problem 328 to bed, have my eye on Problem 304, known as Primonacci. This problem requires working with Fibonacci numbers with trillions of digits - numbers so big that they won’t fit in RAM on any computer I have access to.