![]() |
Dr. Dobb's Portal April, 2007 Article on DDJ site |
I'm an inveterate fan of wordplay of all sorts - puzzles, anagrams, crosswords. I've been known online by my anagrammatic name, SnorkelMan, all the way back to the ancient days of the text mode BBS. My continual hectoring of the staff at the Dallas Morning News over errors in their print version of the New York Times crossword puzzle led them to finally just give me the job of proofreading it. I spend way too much time on the crosswords and other puzzles, both online and in print. In other words, I'm a sucker for a good word puzzle.

Figure 1 - Dallas Morning News Article
My sense of wordplay was naturally piqued this weekend when I heard the latest weekly puzzle challenge from Will Shortz on NPR Weekend Edition. The challenge, from contributor David Edelheit, read as follows:
Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?
As sometimes happens, when I heard this puzzle, and the answer didn't click immediately, my first thought was "I could write a program to solve this faster than I can figure it out myself."
That's a treacherous thought for a puzzler, because it immediately diverts that little thread in the back of your mind that is supposed to be solving the puzzle, instead putting it on the task of writing the program.
But it turned out to be an interesting problem in efficiency, and so I'm glad I went down that path.
First Pass
Most of my work these days is in C++, and while C++ doesn't have the world's best string manipulation facilities, I thought it had enough to do the job on this puzzle. Figuring that the problem was small enough to solve via brute force, I decided that the general course of the program would be to work my way through all 50*49/2 combinations of states, and test them against all 48*47/2 remaining combinations. That's just a little more than a million operations, which ought to be child's play. Thus, the basic program loop was going to look like this:
-
for ( int i = 0 ; i <49 ; i++ )
-
for ( int j = i + 1; j <50 ; j++ )
-
for ( int m = 0 ; m <49 ; m++ )
-
if ( m != i && m != j )
-
for ( int n = m + 1 ; n <50 ; n++ )
-
if ( n != i && n != j )
-
//compare state i and j against state m and n
Still thinking brute force, I was looking for the simplest way to store the data so that it would be easy to compare, and turned to std::multiset. I knew that if I stored all the characters from states i and j in one std::multiset object, and all the characters from states m and n in another, I could quickly compare one against the other with a simple equality operator.
So in the above loop, I inserted these lines after the first two for statements:
-
std::multiset<char> label1;
-
char *p = states[ i ];
-
while (*p)
-
label1.insert( *p++ );
-
p = states[ j ];
-
while (*p)
-
label1.insert( *p++ );
(Note that I had snagged the names of the states from one of the first sites in a Google search, and inserted them into an array of character pointers called states.)
I inserted a similar definition for label2 inside the second set of two for statements, which means all I had left to do was a simple comparison of label1 against label2:
-
if ( label1 == label2 )
-
std::cout <<states[ i ] <<", "
-
<<states[ j ] <<", "
-
<<states[ m ] <<", "
-
<<states[ n ] <<"\n";
I did a quick compile, tested the code, and sure enough, the contents of the multisets were indeed sorted concatentations of the letters of each pair of states. Time to run!
My first disappointment was seeing that, while the program was indeed running properly, it was going slow enough that it looked like it was going to take a sizable fraction of an hour to make it through the entire alphabet. I could just wait, but in this case I decided I could optimize faster than it would take to wait for the first results.
Second Pass
It's pretty obvious that calculating the concatenation of state m and n in the innermost loop is full of wasted cycles, since it is repeatedly calculating the same state values. I knew it was inefficient, but I didn't think it was going to matter too much.
Since it turned out that it did matter, I decided to precalculate all the values before entering the four-deep nested comparison loop, with code like this:
-
std::multiset<char> letters[ 50 ][ 50 ];
-
for ( int i = 0 ; i <49 ; i++ )
-
for ( int j = i + 1 ; j <50 ; j++ ) {
-
char *p = states[ i ];
-
while (*p)
-
letters[ i ][ j ].insert( *p++ );
-
p = states[ j ];
-
while ( *p )
-
letters[ i ][ j ].insert( *p++ );
-
}
Then I didn't have to do any computation in my main loop, I just had to modify the comparison line in the innermost loop:
-
if ( letters[i][j] == letters[m][n] )
-
std::cout <<states[ i ] <<", "
-
...
This modified program did indeed speed things up considerably, bringing the run time down from a fraction of an hour to just a few seconds, even with a bit of progress tracing turned on.
Third and Final Pass
Although I could have stopped here, I thought it might be interesting to see how expensive the use of an associative container like multiset was compared to something simpler. I replaced the setup code so that it stored the data in an std::vector instead of a multiset, on the theory that the comparison operator would run much faster on a vector. I had to add a call to sort the data after inserting it in the vector, which would be executed 50 times, but nowhere near as many times as the comparison operator.
The results were more or less as I expected. When run under Windows with default Release optimization, the vector version of the program ran about twice as fast as the multiset version. When compiled with g++ 3.3.5 with -O2, I saw roughly the same ratio of execution speeds.
No Spoilers
I'm not going to spoil the puzzle for you by giving away the answer. Let's just say that it is a good word puzzle, and if you manage to arrive at the answer you'll see why. If you don't manage to solve it in your head, you can download the source, compile it, and get there by brute force, just like I did.
But I don't think it will count as a spoiler if I tell you that you don't need a computer program to solve this problem.
Recommended Reading
If you like puzzles, the books below ought to give you a few hours of entertainment. If you purchase them after clicking through on the links below, you'll be expanding your knowledge and supporting this site at the same time. Thanks. (Note: all links will open in a new window.)

20 users commented in " Puzzling "
Follow-up comment rss or Leave a TrackbackBecause the solution is so "obvious" in retrospect does not mean one doesn't need a computer to solve it. I'm reminded of the apocryphal story of Columbus, an egg, and some salt: sure it's easy once you know the answer. That being said, the brute force solution was not so hard. Five minutes and one line of APL covers it. A person on the Jsoftware Chat forum told me the hardest part was typing in all the states' names. I'm guessing he's not afraid of unnecessary work.
True enough Randy.
But it feels good as a puzzler if you make use of extra knowledge. In this case, part of the extra knowledge implicit in the puzzle is that the puzzle might not require a brute force solution. And if it doesn't, what are the obvious ways that it could be solved easily? Sometimes you get it, sometimes you don't...
>A person on the Jsoftware Chat forum told me the
>hardest part was typing in all the states’ names.
Need to talk to him about this thing called "cut and paste":)
A nice solution from Vince Huston, Plano, TX:
There's a very simple correct answer to this question that you might well be able to get from looking at a map of the US for a couple of minutes, but I'm not so sure that it's the same answer you're looking for.
Hi Graham,
That's the point I was making at the end in "No Spoilers". That knowledge that there IS an easy answer is probably enough to get the answer for most people.
This looked like something a shell script could solve, or at least get you most of the way. So I wrote one that sorts the letters of each state into a sorted string, creates a 2450-line file that has all the state pairs (including lots of redundant ones), does the sort-a-word on that file, and then a final sort {{{sort | uniq -c | grep -v '[12]'}}}.
Having figured it out from your last comment, I just eyeballed the word-sorted unique pair to confirm that guess, rather than add the code to correlate the word-sorted pair with the in-order state names that it came from.
Hi Doug,
The wordpress comment module can be fairly lame about special characters, but if you email me the script I'll either get it in your comment or add it as an attachment link.
[...] Via Reddit I found Mark Nelson’s post about a recent word puzzle from NPR’s Weekend Edition: [...]
Adam, that's similar to my first pass effort, which is pretty non-optimal, being an O(n4) solution. Vince Huston's is going to be quite a bit better performing.
Heh, oops. I just actually read all the comments and noticed Vince took the same approach I did, with very similar looking code. I guess I'll post my solution anyway (Ruby, yay!) even though it doesn't add too much. This solution runs in much less than one second, and took me only a two minutes to code (including debugging). I prettied it up a little bit to post here though...
If you are interested in seeing a Haskell solution, I wrote one to demonstrate a handy little clustering function that I use all the time. I'll not tempt the Worpress comment module to mangle the code (indentation is important in Haskell) so here's the link instead: http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox
Cheers! --Tom
if only I understood the first thing about Haskell!
I have a solution that while for the problem at hand might be overkill is never the less another approach to the problem.
In essence the solution consists of the following
Build a list of the unique 2 state pairs and combine the letters in the pair into string. store the list of the original strings in on table on a sql db and store a list of alphabettically sorted strings in another column in that same table. do a select(unsortedstring) where count(sortedstring > 1) order by sortedstring. you will then get the answer to the problem
[...] found out about it from Mark Nelson who, in turn, heard it on NPR. It’s not a terribly difficult riddle if you take a moment to [...]
Hi Mark!
Check out Anders Pearson's analysis about your approach to the problem and how he sees that relates to programming languages:
http://thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses/
An interesting read, as was this article.
I don't have any major issues with what Anders says, but I do have a couple of comments.
First, as to the start with a quad-nested loop. I frequently (especially when writing) start with the most elemental brute force implementation possible. There are a couple of really good reasons to do this.
A brute force solution is usually much easier to prove correct. It is often much easier to understand, especially for someone who is not an experiences programmer or developer. So it provides a really good starting point for a more optimized version of the algorithm. It's natural for a developer to jump directly into something more optimal, and it certainly makes sense to do that for regular work, but for an article I happen to like the slower approach.
Second, I'm not sold on the notion that a choice of a static or scripted language predisposes you to certain behaviors. I think it more likely that the facilities your language provides are what drives your choices.
As in this example. It's no secret that C/C++ (the language I use at work all day, and thus my language of choice) is lamentably weak on string handling. Further, associative arrays are a relatively new feature, and hashed associative arrays are still not part of the standard. (Unless TR1 has passed and I haven't received the news yet.)
So someone who writes Perl code all day is much more likely to have worked on problems analyzing text, using regular expressions, and using strings as keys into other data structures. It's not the scripting that does this, it would be just as true in a statically typed language that had great string support.
The interesting thing is that an optimal solution to this problem ends up looking similar in all the languages we've seen here. Even the unfamiliar Haskell looks vaguely familiar.
I'd really enjoy seeing some cases with language X, Y, and Z where the optimal C++ solution was not so good, and the optimal X/Y/Z solution looked amazingly different. Last time I did any functional programming was maybe 1975? and that seems like one place where you might be able to Think Different.
[...] from A Simple Programming Puzzle Seen Through Three Different Lenses regarding states puzzle from Mark Nelson, which was originally posted NPR. Here is my translation of Anders Pearson’s [...]
[...] I've confessed in the past, I'm a sucker for word puzzles. My recent post on a Will Shortz puzzle from NPR Morning Edition ended up provoking a surprising amount of comment, [...]
Hi. If I use a set of element, each element has two int fields, represents for all string_m and string_n for state[i][j] (string_i and string_j), then all elements of state[i][j] is also in state[m][n]. After we finish the pass for state[i][j] , we can mark that we solved state[m][n]. It can reduce many redundant computations. Sorry for my influent English, hope you understand my idea :)
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]