The container classes included in the C++ standard library serve as good illustrations of both the strengths and the weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term useless, and documentation that makes a mockery of the word.

In this article I’ll illustrate how you might bump into these problems using the unordered_map container, as well showing you how to work past the problems. By rights this basic hash map should be the first- or second-most used container in your arsenal, but if you are less than a C++ savant, you might find yourself ditching it out of frustration.

Hash Tables

It’s a little bit of an embarrassment to the C++ community that it didn’t have a hash table in the standard library until TR1 was published in 2005. In a perfect world the original standard should have contained hash map and hash set containers. But Alexander Stepanov didn’t include these containers in the original Standard Template Library, and the standardization committee was reluctant to bless containers that didn’t have a decent amount of mileage in the real world.

By 2005 there were enough non-standard implementations to allow the TR1 extension to confidently add four new template classes:

  • unordered_map
  • unordered_multimap
  • unordered_set
  • unordered_multiset

With basically the same semantics as their ordered counterparts (map, multimap, etc.) and the ideal O(1) performance afforded by hashed indexing, C++ finally had a feature that was basically table stakes for any high-level language created since the mid-1980s.

The container classes in general, and unordered_map in particular, are such a useful part of the language that we generally want to introduce them as early as possible to people learning C++. Admittedly, the underpinnings of template programming are out of the beginner’s depth, but using the containers doesn’t require much knowledge about templates beyond an understanding of a few syntax rules.

A simple piece of sample code that you might use to teach beginners how to use these containers is shown here:

//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>

using namespace std;

typedef string Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids["Mark Nelson"] = 40561;
    ids["Andrew Binstock"] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first << " : " << ii->second << endl;
    return 0;
}

This example works great in the classroom or in the documentation page for unordered_map. But the new C++ programmer runs into trouble as soon as he or she steps outside the classroom and tries to implement a real-world example. A very common change to this program on its path to the real world will be in the use of a simple class or structure to hold the person’s name. To keep it simple, I’ll just assume we want to keep first and last names separate, and use the built-in pair class to hold the name:

//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>

using namespace std;

typedef pair<string,string> Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first 
        << " "
        << ii->first.second 
        << " : "
        << ii->second
        << endl;
        return 0;
}

This seemingly small change generates seven errors in Visual C++, five in g++, and none of the errors point the user to the actual problem.

And what is the problem? It’s actually a simple one: unordered_map doesn’t know how to create a hash for the given key type of std::pair. Instead, the user is left to puzzle over things like this:

c:\program files\microsoft visual studio 10.0\vc\include\xfunctional(768):
 error C2440: 'type cast' :
 cannot convert from 'const Name' to 'size_t'

Or worse yet from g++:

/tmp/cc0B9FPH.o: In function `std::__detail::_Hash_code_base<std::pair<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, 
std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int>, 
std::_Select1st<std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int> >, 
std::equal_to<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, 
std::hash<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code(
std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&) const'

The g++ compiler doesn’t seem to actually produce a sentence describing the error, I think it figures no human could parse it anyway.

The inscrutability of compiler errors in template classes is hardly something new – I was complaining about it all the way back in 1995. And the C++0x committee tried to do something about it. Had the C++0x Concepts feature been accepted, the compiler might instead have issued an error message looked more like this:

hash_test.cpp(15): type 'Name' does not have a hash function

Unfortunately, Concepts did not make it, and we are stuck with error messages that are of no help at all.

RTFM

There are a couple of obvious places to try to figure out what these errors mean. Google would be one, and Visual Studio’s class documentation pages would be another.

Just as an experiment, try putting yourself in the place of a novice, and execute a search on:

error C2440: 'type cast' : cannot convert from 'const Name' to 'size_t'

or show some sophistication and change your search to:

"error C2440: type cast : cannot convert from" to size_t

You will find some good clues, but probably no solutions to your problem. Much of the published code deals with pre-standard hash tables using boost implementations, or early g++ hash tables, which are not going to help.

But let’s say you eventually figure out that you need to write a hash function for your Name class. All you need to know are the mechanics. Where do you find out what they are?

The logical place to do this would be in the Visual Studio unordered_map documentation page. This page has some good information in it, but nowhere does it address an undoubtedly common problem: how do I create a user-defined hash function for unordered_map?. And don’t even thing about getting anything useful from the g++ documentation page.

These stumbling blocks are the kind of thing that give C++ a reputation as one of the most difficult languages to learn. To see the contrast, you might compare it to Java. The Java programmer won’t ever run into this problem, because the language defines a default hash function in the base class Object. Any Java object may be used as a key in Java’s HashMap generic class. While the base class definition may be far from optimal for many cases, it will at least work, and won’t prevent the beginner from using the container in real programs.

C++ could have done the same thing when implementing the unordered containers, but it was constrained by both philosophy and language limitations. It would have been a real accomplishment to have overcome both, and to be honest, the people on the standards committee have to work hard enough as it is. Insisting on X-Ray vision and a cape for each member might be pushing it.

One Point of Light

So what is to be done? The language has structural problems that make it hard to issue good errors. Documentation is never as good as we like it. Are we stuck at the status quo?

While I’m not likely to change the Visual Studio documentation or the C++ compiler, I have found that one small article, like this one, that has good SEO terms to describe the problem, can help a lot of people. As an example, my next_permutation() article from 2002 still gets hundreds of readers every week, and I hope most of them walk away understanding how to use this function.

The same thing could end up being true for this article. I’ll spend the rest of it showing you four good ways to define a hash function for use in unordered_map under C++0x, and with Google’s help, it may end up providing the missing manual for this particular problem.

Method 1 – A simple function

You’re used to seeing unordered_map declared with two template parameters. But a look at the help page shows that it actually takes five – the last three usually accept default values:

template<class Key,
    class Ty,
    class Hash = std::hash<Key>,
    class Pred = std::equal_to<Key>,
    class Alloc = std::allocator<std::pair<const Key, Ty> > >
    class unordered_map;

The third parameter to the definition is a function object type that is used by the class to convert a key into a hash code. By default it is set to std::hash. Internally the unordered_map class calls operator() on an object of that type in order to get a hash code for a given key.

Note also that the several of constructors for unordered_map also take a default parameter which is an instance of this function object type.

So there are two places where we can provide some information about how to hash the key used in the unordered_map, but we normally don’t fill in these items. The reason that we often don’t is that the C++ standard library already defines instances of std::hash<T> for commonly used types. So I can write a program that contains a line like this:

int main(int argc, char* argv[])
{
    cout << std::hash<const char *>()("Mark Nelson") << endl;
    return 0;
}

Which when run, produces:

134514544

The problem, of course, is that the standard library did not implement a version of hash<pair<string,string>> – or any of the other infinite varieties of user-defined classes. However, since my new Name class is composed of two string objects, and the standard library knows how to hash string objects, I can create a pretty good hash function of my own that looks like this:

size_t name_hash( const Name & name )
{
    return hash<string>()(name.first) ^ hash<string>()(name.second);
}

As a general rule of thumb, if I have two hashes for independent variables, and I combine them using XOR, I can expect that the resulting hash is probably just as good as the input hashes.

Building this function is easy enough, but how do I actually use it with unordered_map? Although the solution is fairly simple, if you find templates confusing already, you aren’t liable fumble your way through to the answer without an enormous amount of effort.

Basically, I have to modify my use of the map class in two places. First, I have to pass in a pointer to the hash function in the constructor of the map. The standard defines a constructor that takes an initial number of buckets and a hashing object as inputs. So the first step is to modify the declaration code to look like this:

unordered_map<Name,int> ids(100, name_hash );

This is only half the battle, however. The default implementation of unordered_map expects to be using a function object of type std::hash to calculate hashes, and that is not what I passed in to the constructor. So I also have to add a third template parameter to my declaration – a template parameter which matches the type of the function object I am passing in to the constructor.

Creating the proper template parameter to match this simple hashing function requires more than your usual level of library-fu. One way I’ve done this in the past is to wrap the function type inside the std::function template, which is defined in header <functional>. When you do this, your map will have a hasher object that is an instance of function. Upon initialization of the map, the function object will be assigned a pointer to name_hash, which can then be called via the interface to function.

The resulting declaration will then look like this:

unordered_map<Name,int,function<size_t( const Name & name )>> ids(100, name_hash );

C++0x gives me a slightly easier way to do this, and the code below shows this simpler alternative. By using the decltype keyword, I can take the type of my hash function and pass it is as a template parameter. Not only is the code simpler this way, but I avoid defining one thing in two different places.

//
// This program uses a simple user-defined function
// to provide a hash function for use in unordered_map
// 
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>
#include <functional>

using namespace std;

typedef pair<string,string> Name;

size_t name_hash( const Name & name )
{
    return hash<string>()(name.first) ^ hash<string>()(name.second);
}

int main(int argc, char* argv[])
{
	unordered_map<Name,int,decltype(&name_hash)> ids(100, name_hash );
	ids[Name("Mark", "Nelson")] = 40561;
	ids[Name("Andrew","Binstock")] = 40562;
	for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
		cout << ii->first.first 
                     << " "
                     << ii->first.second 
                     << " : "
                     << ii->second
                     << endl;
	return 0;
}

Method 2 – A simple function defined inline

In some cases where your hash function is short and sweet, you might want to take advantage of the new C++0x support for lambda expressions. In this particular case, using a lambda to define your hash function lets you define the hasher right where you use it – which may or may not provide clarity for the program. In this particular case, the finished result looks like this:

//
// This program uses a simple user-defined function
// to provide a hash function for use in unordered_map.
// The function is passed in as a lambda expression to
// the unordered_map constructor.
// 
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>
#include <functional>

using namespace std;

typedef pair<string,string> Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int,function<size_t ( const Name & name )>> 
    ids(100, []( const Name & name )
             {
                 return hash<string>()(name.first) ^ hash<string>()(name.second);
             } );
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first 
             << " "
             << ii->first.second 
             << " : "
             << ii->second
             << endl;
    return 0;
}

As far as the compiler is concerned, this program will probably generate nearly identical code to the previous one, so it is unlikely that you should prefer one over the other for reasons of efficiency.

I don’t use the lambda expression method for two reasons:

  1. When using the lambda expression, I can’t use decltype in the template class definition to get the type of the hash function object. This means I have to manually enter it, which means I’m manually synching one definition between two places in my code – always an opportunity for trouble.
  2. As this is being written in 2011, lambda expressions are still unfamiliar to people, and aren’t supported in a lot of compilers currently in use on production systems, so I save their use for places where they provide a marked improvement in either program structure or readability.

Your opinions may well differ.

Method 3 – A function object

Function objects are a way to package up functions so they can be called in a way that is often convenient to a library writer. So even though you might not have ever dreamed up the idea of creating a function object to provide a hash function to the library, this is the way unordered_map prefers things. My definition of a function object to use with this definition of Name is shown here:

struct hash_name {
    size_t operator()(const Name &name ) const
    {
        return hash<string>()(name.first) ^ hash<string>()(name.second);
    }
};

All I have done here is wrap up the hash function a class. And while that might not seem like much of a big deal, it allows a class to use this function when all it has is the class definition. When I define unordered_map using this function object, my code is a bit simpler:

unordered_map<Name,int,hash_name> ids;

You can see that I still have to include a third template class parameter, but I don’t have to pass in a reference to the function object in the constructor. This is because unordered_map keeps track of the class definition, and when it comes time to actually get a hash, it can simply construct the object on the fly and pass in the data. A sample (hypothetical) class that does this might have code like this:

template <class HashKey,
         class HashValue,
         class HashObject >
class HashMap {
...
void insert( HashKey &key, HashVal &val )
{
    size_t index = HashObject()( key );
...

Putting together for the sample program I’ve been using throughout, you get this code:

//
// This program uses a function object to define
// a hash function for use in unordered_map.
// 
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>
#include <functional>

using namespace std;

typedef pair<string,string> Name;

struct hash_name {
    size_t operator()(const Name &name ) const
    {
        return hash<string>()(name.first) ^ hash<string>()(name.second);
    }
};

int main(int argc, char* argv[])
{
    unordered_map<Name,int,hash_name> ids;
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first 
             << " "
             << ii->first.second 
             << " : "
             << ii->second
             << endl;
    return 0;
}

This method clearly has some nice features. In return for having to package your hash function inside a structure definition (and use the unfamiliar operator() ), you get the advantage of having an unordered_map declaration that is considerably simpler than the ones used in the previous two examples.

In terms of the generated code, this example is again probably going to generate nearly identical code to the previous examples, meaning that your preference towards using it should be based strictly on ease of coding, readability, and maintenance issues.

Method 4 – Specializing std::hash

When you use unordered_map with all the default class parameters, it tries to use a function object of type std::hash<Key> to create your hash keys. As discussed way back at the start of this program, this doesn’t work in many cases because nobody bothered to create a specialization of the hash object for your specific class. In many cases, this is because your specific class had not been invented yet.

It stands to reason that if you are providing a class that will be used by other people, it might be nice to actually create the instances of the hash class for your class. When you do that, and include the definition in the header file that defines your class, people will be able to use your class as a key for any of the unordered containers without any additional work.

Defining a specialization of std::hash<T> for your class is really no different than the method 3, shown immediately above. The only difference is that you have to use the name hash for your object, and you have to define it as a specialization of that template class, and finally, you have to hoist the whole thing into the std namespace.

//
// This program uses a specialization of
// std::hash<T> to provide the function
// object needed by unordered_map.
// 
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>
#include <functional>

using namespace std;

typedef pair<string,string> Name;

namespace std {
    template <>
        class hash<Name>{
        public :
            size_t operator()(const Name &name ) const
            {
                return hash<string>()(name.first) ^ hash<string>()(name.second);
            }
    };
};

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first 
             << " "
             << ii->first.second 
             << " : "
             << ii->second
             << endl;
	return 0;
}

As you can see, for the user of your class, this is the simplest of all possible solutions – no changes are needed to get the default versions of unordered_map to work properly. If you are going to be using this class in multiple places, this is probably the best solution. However, you need to be sure you aren’t going to pollute the std namespace with a hash function that might conflict with those used by other classes.

One Last Source of Trouble

In addition to requiring a hash function, the unordered containers also need to be able to test two keys for equality. The canonical way for them to do this is with a version of operator==() defined at the global namespace. This is typically a function you are used to having to construct when creating new classes, but if you overlook it, you will be up against the same raft of incomprehensible compiler errors seen earlier in this article.

I didn’t have to deal with it in this article because the standard library already defines this operator for std::pair. Of course, when using std::pair you also have to make sure you have an equality operator for T1 and T2.

Wrap-Up

This article showed you four different ways to define a hash function for use with a user-defined class and the unordered C++0x containers. Different people will choose one of the four methods for personal reasons. When it comes to performance, there is probably no reason to prefer one or the other.

None of the techniques described here are particularly difficult to implement, but it is always a surprise and a disappointment to see how hard it is to get this information out of the standard resources.