Dr. Dobb's Portal April, 2007 Article on DDJ site

I’m an inveterate fan of wordplay of all sorts - puzzles, anagrams, crosswords. I’ve been known online by my anagrammatic name, SnorkelMan, all the way back to the ancient days of the text mode BBS. My continual hectoring of the staff at the Dallas Morning News over errors in their print version of the New York Times crossword puzzle led them to finally just give me the job of proofreading it. I spend way too much time on the crosswords and other puzzles, both online and in print. In other words, I’m a sucker for a good word puzzle.

Figure 1 - Dallas Morning News Article

My sense of wordplay was naturally piqued this weekend when I heard the latest weekly puzzle challenge from Will Shortz on NPR Weekend Edition. The challenge, from contributor David Edelheit, read as follows:

Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?

As sometimes happens, when I heard this puzzle, and the answer didn’t click immediately, my first thought was “I could write a program to solve this faster than I can figure it out myself.”

That’s a treacherous thought for a puzzler, because it immediately diverts that little thread in the back of your mind that is supposed to be solving the puzzle, instead putting it on the task of writing the program.

But it turned out to be an interesting problem in efficiency, and so I’m glad I went down that path.

First Pass

Most of my work these days is in C++, and while C++ doesn’t have the world’s best string manipulation facilities, I thought it had enough to do the job on this puzzle. Figuring that the problem was small enough to solve via brute force, I decided that the general course of the program would be to work my way through all `50*49/2` combinations of states, and test them against all `48*47/2` remaining combinations. That’s just a little more than a million operations, which ought to be child’s play. Thus, the basic program loop was going to look like this:

Still thinking brute force, I was looking for the simplest way to store the data so that it would be easy to compare, and turned to `std::multiset`. I knew that if I stored all the characters from states `i` and `j` in one `std::multiset<char>` object, and all the characters from states `m` and `n` in another, I could quickly compare one against the other with a simple equality operator.

So in the above loop, I inserted these lines after the first two `for` statements:

(Note that I had snagged the names of the states from one of the first sites in a Google search, and inserted them into an array of character pointers called `states`.)

I inserted a similar definition for `label2` inside the second set of two `for` statements, which means all I had left to do was a simple comparison of `label1` against `label2`:

I did a quick compile, tested the code, and sure enough, the contents of the multisets were indeed sorted concatentations of the letters of each pair of states. Time to run!

My first disappointment was seeing that, while the program was indeed running properly, it was going slow enough that it looked like it was going to take a sizable fraction of an hour to make it through the entire alphabet. I could just wait, but in this case I decided I could optimize faster than it would take to wait for the first results.

Second Pass

It’s pretty obvious that calculating the concatenation of state `m` and `n` in the innermost loop is full of wasted cycles, since it is repeatedly calculating the same state values. I knew it was inefficient, but I didn’t think it was going to matter too much.

Since it turned out that it did matter, I decided to precalculate all the values before entering the four-deep nested comparison loop, with code like this:

Then I didn’t have to do any computation in my main loop, I just had to modify the comparison line in the innermost loop:

This modified program did indeed speed things up considerably, bringing the run time down from a fraction of an hour to just a few seconds, even with a bit of progress tracing turned on.

Third and Final Pass

Although I could have stopped here, I thought it might be interesting to see how expensive the use of an associative container like `multiset `was compared to something simpler. I replaced the setup code so that it stored the data in an `std::vector<char>` instead of a `multiset`, on the theory that the comparison operator would run much faster on a `vector`. I had to add a call to sort the data after inserting it in the `vector`, which would be executed 50 times, but nowhere near as many times as the comparison operator.

The results were more or less as I expected. When run under Windows with default Release optimization, the `vector` version of the program ran about twice as fast as the `multiset` version. When compiled with g++ 3.3.5 with `-O2`, I saw roughly the same ratio of execution speeds.

No Spoilers

I’m not going to spoil the puzzle for you by giving away the answer. Let’s just say that it is a good word puzzle, and if you manage to arrive at the answer you’ll see why. If you don’t manage to solve it in your head, you can download the source, compile it, and get there by brute force, just like I did.

usa.cpp

But I don’t think it will count as a spoiler if I tell you that you don’t need a computer program to solve this problem.