Hash Functions for C++ Unordered Containers
The container classes included in the C++ standard library serve as good illustrations of both the strengths and the weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term useless, and documentation that makes a mockery of the word.
In this article I’ll illustrate how you might bump into these problems using the
unordered_map container, as well showing you how to work past the problems. By
rights this basic hash map should be the first- or second-most used container in your arsenal,
but if you are less than a C++ savant, you might find yourself ditching it out of frustration.
It’s a little bit of an embarrassment to the C++ community that it didn’t have a hash table in the standard library until TR1 was published in 2005. In a perfect world the original standard should have contained hash map and hash set containers. But Alexander Stepanov didn’t include these containers in the original Standard Template Library, and the standardization committee was reluctant to bless containers that didn’t have a decent amount of mileage in the real world.
By 2005 there were enough non-standard implementations to allow the TR1 extension to confidently add four new template classes:
With basically the same semantics as their ordered counterparts (
multimap, etc.) and the ideal O(1) performance afforded by hashed indexing,
C++ finally had a feature that was basically table stakes for any high-level language created
since the mid-1980s.
The container classes in general, and
unordered_map in particular, are such a useful
part of the language that we generally want to introduce them as early as possible to people
learning C++. Admittedly, the underpinnings of template programming are out of the beginner’s
depth, but using the containers doesn’t require much knowledge about templates beyond an
understanding of a few syntax rules.
A simple piece of sample code that you might use to teach beginners how to use these containers is shown here:
This example works great in the classroom or in the documentation page for
unordered_map. But the new C++ programmer runs into trouble as soon as he or she
steps outside the classroom and tries to implement a real-world example. A very common change to
this program on its path to the real world will be in the use of a simple class or structure to
hold the person’s name. To keep it simple, I’ll just assume we want to keep first and last
names separate, and use the built-in
pair class to hold the name:
This seemingly small change generates seven errors in Visual C++, five in g++, and none of the errors point the user to the actual problem.
And what is the problem? It’s actually a simple one:
unordered_map doesn’t know how
to create a hash for the given key type of
Instead, the user is left to puzzle over things like this:
c:\program files\microsoft visual studio 10.0\vc\include\xfunctional(768): error C2440: 'type cast' : cannot convert from 'const Name' to 'size_t'
Or worse yet from g++:
/tmp/cc0B9FPH.o: In function `std::__detail::_Hash_code_base<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int>, std::_Select1st<std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int> >, std::equal_to<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::hash<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code( std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&) const'
The g++ compiler doesn’t seem to actually produce a sentence describing the error, I think it figures no human could parse it anyway.
The inscrutability of compiler errors in template classes is hardly something new - I was complaining about it all the way back in 1995. And the C++0x committee tried to do something about it. Had the C++0x Concepts feature been accepted, the compiler might instead have issued an error message looked more like this:
hash_test.cpp(15): type 'Name' does not have a hash function
Unfortunately, Concepts did not make it, and we are stuck with error messages that are of no help at all.
There are a couple of obvious places to try to figure out what these errors mean. Google would be one, and Visual Studio’s class documentation pages would be another.
Just as an experiment, try putting yourself in the place of a novice, and execute a search on:
error C2440: 'type cast' : cannot convert from 'const Name' to 'size_t'
or show some sophistication and change your search to:
"error C2440: type cast : cannot convert from" to size_t
You will find some good clues, but probably no solutions to your problem. Much of the published code deals with pre-standard hash tables using boost implementations, or early g++ hash tables, which are not going to help.
But let’s say you eventually figure out that you need to write a hash function for your
Name class. All you need to know are the mechanics. Where do you find out what
The logical place to do this would be in the Visual Studio unordered_map documentation page. This page has some good information in it, but nowhere does it address an undoubtedly common problem: how do I create a user-defined hash function for unordered_map?. And don’t even thing about getting anything useful from the g++ documentation page.
These stumbling blocks are the kind of thing that give C++ a reputation as one of the most
difficult languages to learn. To see the contrast, you might compare it to Java. The Java
programmer won’t ever run into this problem, because the language defines a default hash function
in the base class
Object. Any Java object may be used as a key in Java’s
HashMap generic class. While the base class definition may be far from optimal for
many cases, it will at least work, and won’t prevent the beginner from using the container in
C++ could have done the same thing when implementing the unordered containers, but it was constrained by both philosophy and language limitations. It would have been a real accomplishment to have overcome both, and to be honest, the people on the standards committee have to work hard enough as it is. Insisting on X-Ray vision and a cape for each member might be pushing it.
One Point of Light
So what is to be done? The language has structural problems that make it hard to issue good errors. Documentation is never as good as we like it. Are we stuck at the status quo?
While I’m not likely to change the Visual Studio documentation or the C++ compiler, I have found
that one small article, like this one, that has good SEO terms to describe the problem, can help
a lot of people. As an example, my
article from 2002 still gets hundreds of readers every week, and I hope most of them walk away
understanding how to use this function.
The same thing could end up being true for this article. I’ll spend the rest of it showing you
four good ways to define a hash function for use in
unordered_map under C++0x, and
with Google’s help, it may end up providing the missing manual for this particular problem.
Method 1 - A simple function
You’re used to seeing
unordered_map declared with two template parameters. But a
look at the help page shows that it actually takes five - the last three usually accept default
The third parameter to the definition is a function object type that is used by the class to
convert a key into a hash code. By default it is set to
unordered_map class calls
operator() on an object of that type in
order to get a hash code for a given key.
Note also that the several of constructors for
unordered_map also take a default
parameter which is an instance of this function object type.
So there are two places where we can provide some information about how to hash the key used in
unordered_map, but we normally don’t fill in these items. The reason that we
often don’t is that the C++ standard library already defines instances of
std::hash<T> for commonly used types. So I can write a program that contains a
line like this:
Which when run, produces:
The problem, of course, is that the standard library did not implement a version of
hash<pair<string,string>> - or any of the other infinite varieties of
user-defined classes. However, since my new
Name class is composed of two
string objects, and the standard library knows how to hash
objects, I can create a pretty good hash function of my own that looks like this:
As a general rule of thumb, if I have two hashes for independent variables, and I combine them using XOR, I can expect that the resulting hash is probably just as good as the input hashes.
Building this function is easy enough, but how do I actually use it with
unordered_map? Although the solution is fairly simple, if you find templates
confusing already, you aren’t liable fumble your way through to the answer without an enormous
amount of effort.
Basically, I have to modify my use of the map class in two places. First, I have to pass in a pointer to the hash function in the constructor of the map. The standard defines a constructor that takes an initial number of buckets and a hashing object as inputs. So the first step is to modify the declaration code to look like this:
This is only half the battle, however. The default implementation of
expects to be using a function object of type
std::hash<key> to calculate hashes,
and that is not what I passed in to the constructor. So I also have to add a third template
parameter to my declaration - a template parameter which matches the type of the function object
I am passing in to the constructor.
Creating the proper template parameter to match this simple hashing function requires more than
your usual level of library-fu. One way I’ve done this in the past is to wrap the function type
std::function template, which is defined in header
<functional>. When you do this, your map will have a hasher object that is an
function. Upon initialization of the map, the function object will be
assigned a pointer to
name_hash, which can then be called via the interface to
The resulting declaration will then look like this:
C++0x gives me a slightly easier way to do this, and the code below shows this
simpler alternative. By using the
decltype keyword, I can take the type of my
hash function and pass it is as a template parameter. Not only is the code simpler this way,
but I avoid defining one thing in two different places.
Method 2 - A simple function defined inline
In some cases where your hash function is short and sweet, you might want to take advantage of the new C++0x support for lambda expressions. In this particular case, using a lambda to define your hash function lets you define the hasher right where you use it - which may or may not provide clarity for the program. In this particular case, the finished result looks like this:
As far as the compiler is concerned, this program will probably generate nearly identical code to the previous one, so it is unlikely that you should prefer one over the other for reasons of efficiency.
I don’t use the lambda expression method for two reasons:
When using the lambda expression, I can't use
decltypein the template class definition to get the type of the hash function object. This means I have to manually enter it, which means I'm manually synching one definition between two places in my code - always an opportunity for trouble. As this is being written in 2011, lambda expressions are still unfamiliar to people, and aren't supported in a lot of compilers currently in use on production systems, so I save their use for places where they provide a marked improvement in either program structure or readability.
Your opinions may well differ.
Method 3 - A function object
Function objects are a way to package up functions so they can be called in a way that is often
convenient to a library writer. So even though you might not have ever dreamed up the idea of
creating a function object to provide a hash function to the library, this is the way
unordered_map prefers things. My definition of a function object to use with this
Name is shown here:
All I have done here is wrap up the hash function a class. And while that might not seem like
much of a big deal, it allows a class to use this function when all it has is the class
definition. When I define
unordered_map using this function object, my code is a bit
You can see that I still have to include a third template class parameter, but I don’t have to
pass in a reference to the function object in the constructor. This is because
unordered_map keeps track of the class definition, and when it comes time to actually
get a hash, it can simply construct the object on the fly and pass in the data. A sample
(hypothetical) class that does this might have code like this:
Putting together for the sample program I’ve been using throughout, you get this code:
This method clearly has some nice features. In return for having to package your hash function
inside a structure definition (and use the unfamiliar operator() ), you get the advantage of
unordered_map declaration that is considerably simpler than the ones used
in the previous two examples.
In terms of the generated code, this example is again probably going to generate nearly identical code to the previous examples, meaning that your preference towards using it should be based strictly on ease of coding, readability, and maintenance issues.
Method 4 - Specializing std::hash
When you use
unordered_map with all the default class parameters, it tries to use a
function object of type
std::hash<Key> to create your hash keys. As discussed
way back at the start of this program, this doesn’t work in many cases because nobody bothered to
create a specialization of the hash object for your specific class. In many cases, this is
because your specific class had not been invented yet.
It stands to reason that if you are providing a class that will be used by other people, it might be nice to actually create the instances of the hash class for your class. When you do that, and include the definition in the header file that defines your class, people will be able to use your class as a key for any of the unordered containers without any additional work.
Defining a specialization of
std::hash<T> for your class is really no
different than the method 3, shown immediately above. The only difference is that you have to use
hash for your object, and you have to define it as a specialization of that
template class, and finally, you have to hoist the whole thing into the std namespace.
As you can see, for the user of your class, this is the simplest of all possible solutions - no
changes are needed to get the default versions of
unordered_map to work properly. If
you are going to be using this class in multiple places, this is probably the best solution.
However, you need to be sure you aren’t going to pollute the std namespace with a hash function
that might conflict with those used by other classes.
One Last Source of Trouble
In addition to requiring a hash function, the unordered containers also need to be able to test
two keys for equality. The canonical way for them to do this is with a version of
operator==() defined at the global namespace. This is typically a function you are
used to having to construct when creating new classes, but if you overlook it, you will be up
against the same raft of incomprehensible compiler errors seen earlier in this article.
I didn’t have to deal with it in this article because the standard library already defines this
std::pair<T1,T2>. Of course, when using
std::pair you also
have to make sure you have an equality operator for T1 and T2.
This article showed you four different ways to define a hash function for use with a user-defined class and the unordered C++0x containers. Different people will choose one of the four methods for personal reasons. When it comes to performance, there is probably no reason to prefer one or the other.
None of the techniques described here are particularly difficult to implement, but it is always a surprise and a disappointment to see how hard it is to get this information out of the standard resources.