VC++ 10 Hash Table Performance Problems
Microsoft has never been a slacker in the C++ department - they’ve always worked hard to provide a top-notch, compliant product. Visual Studio 10 supports their current incarnation, and for the most part it is up to their usual standards. It’s a great development environment, and I am a dedicated user, but I have to give Microsoft a demerit in one area: their C++11 hash containers have some serious performance problems - so much that the Debug versions of the containers may well be unusable in your application.
Background
I first noticed the problem with unordered_map
when I was working on the the code for
my updated
LZW article
I found that when running in the debugger, my program would hang after exiting the compression
routine. A little debugging showed that the destructor for my hash table was taking a long time to
run. And by a long time, I mean it was approaching an hour!.
This seemed pretty crazy. Destroying a hash table wouldn’t seem to be a complicated task. I decided to see if I could come up with a reasonable benchmark. I wrote a test program that does a simple word frequency count. As a starter data set, I used the first one million white space delimited words in the 2010 CIA factbook, as published by Project Gutenberg. This data set yields 74,208 unique tokens.
I wrote a simple test rig that I used to test the word count program using four different containers:
unordered_map
indexed by std::string
unordered_map
indexed by std::string *
map
indexed by std::string
map
indexed by std::string *
The reason for testing with std::string *
was to reduce the cost of copying strings
into the hash table as it was filled, and then to reduce the cost of destroying those strings when
the table was destroyed.
I ran tests against map
expecting to see a pretty big difference in performance.
Because map
is normally implemented using a balanced binary tree structure, it has
O(log(N)) performance on insertions. A sparsely populated hash table can have O(1) performance.
By using fairly large data sets, I expected to see a big difference between the two.
I tried to eliminate a few obvious sources for error in my test function - and I used a template function so that I could use the same code on all the different container types:
I didn’t go overboard when it came to instrumenting this problem, I just used the timing functions built into the C++ library. On my Windows and Linux test systems, the values of CLOCKS_PER_SEC are both high enough that I’m not worried about granularity issues.
The First Results
I ran my test program in Visual C++ Release mode, using all the standard settings for a console application. For purposes of comparison, I ran the same program using g++ 4.6.1 on the same computer, booted up under Linux. For the set of 1,000,000 tokens, the results are shown below:
Task | VC++ 10 Release | g++ 4.6.1 -O3 |
---|---|---|
Fill unordered_map<string> | 0.41s | .11s |
Fill unordered_map<string const *> | 0.39s | 0.14s |
Destroy unordered_map<string> | 3.17s | 0.01s |
Destroy unordered_map<string const *> | 3.24s | 0.004s |
Fill map<string> | 0.83s | .53s |
Fill map<string const *> | 0.88s | 0.66s |
Destroy map<string> | .14s | 0.01s |
Destroy map<string const *> | .07s | 0.002s |
There are a few interesting points to take away from these tests:
-
Microsoft's compiler is taking an exceptionally long time to destroy hashed containers - one
order of magnitude greater than it took to create it, and two orders of magnitude greater than
it takes g++ to do the same task.
It doesn't look like constructing and destroying the strings is a big factor. Both compilers
have roughly the same performance with both
std::string
and std::string *
.
Microsoft's behavior is counterintuitive, as it takes longer to construct and destroy
containers using the pointer.
The GNU compiler appears to be able to run through this exercise notably faster.
The time it takes to destroy the table is a concern - having a C++ program hang for over 3 seconds to destroy a modestly large data structure is a serious concern - particularly when the same task completes in a few milliseconds with g++.
The Pathological Results
These concerns are nothing compared to what I see when running in debug mode. Setting my Visual Studio project to Debug mode, then running the same test, yields the results shown here:
Task | VC++ 10 Debug |
---|---|
Fill unordered_map<string> | 17.41s |
Fill unordered_map<string const *> | 17.08s |
Destroy unordered_map<string> | 505.36s |
Destroy unordered_map<string const *> | 505.99s |
Fill map<string> | 13.29s |
Fill map<string const *> | 13.15s |
Destroy map<string> | 0.94s |
Destroy map<string const *> | 0.18s |
Those numbers are hard to believe. Destroying a hash table takes one millisecond when using g++. In VC++ 10, it takes almost ten minutes!
Worse, we suddenly see that hashed containers are slower than the containers built on red-black trees. Again, this just doesn’t make sense.
The big problem with these numbers is that it means the debug mode of the compiler is effectively unusable for a lot of tasks. Regardless of how much testing it does, when it is this slow, it is just not useful.
A Workaround
I didn’t invest the time to try debugging Microsoft’s library, so I don’t really know where the time is being spent. I did try a few things to speed things up, and I found one technique that helps a lot. Before including any Microsoft header files, try entering this single line in your source:
With this definition in place, the delete times return to ball park of the times seen when running in release mode. Of course, you give up some debugging. I believe that an explanation of what this macro does might be found here.
In the final analysis, I think Microsoft has some serious work to to do here. The performance of their hashed containers, and to some lesser extent, the pre-C++11 associative containers, needs some serious examination. If the library is going to run this much slower than the competition, I need a good explanation why.