C++11: Range-based for and auto

Two really handy features in C++11 are the range-based for statement and the auto type specifier. The former allows you iterate over collections using a much more compact form of expression, and the latter takes some of the headache out of the complex type declarations encountered in the standard library. Both of these features have been available in g++ since release 4.6, and are now present in Visual Studio 11, so you can start using them today. (auto typed variables are available in earlier versions of both compilers.) In this post I’ll give you a description of how these new features works, and show you a concrete example of the positive effects they can have on your programs.

The value of containers

It’s hard to overstate the value of the containers in the C++ standard library. With the addition of the hash-based containers in TR1, I rarely if ever find myself tempted to roll my own, or use a third party library. The flexibility and power of the library created by Alexander Stepanov does everything I need.

Despite the technical merit of the container classes, newcomers are often hesitant about completely embracing them. One of the main reasons has to be the conceptual drag imposed by the use of iterators as the primary means of accessing the objects they contain. It’s not that there is anything complicated about the concept, but the syntax can be more than just a little annoying. Let me illustrate it with an example.

Anagramania

The listing below is a C++ program that reads through the Scrabble dictionary and determines which set of letters generates the most anagrams. I’m using C++ circa TR1, in which I have access to the unordered associative containers, but I don’t take any shortcuts to try to simplify the syntax. (The fact that I can write this program in one screen of simple code is a nice testament to the quality of the container library.)

The logic for the program is simple. I use an unordered_multimap called counts to hold the count of all anagram families in the dictionary, with its key being the sorted value of the scrabble word. This means that all words that are anagrams of one another will have the same key. I use an unordered_multimap called words to hold the list of all words that are anagrams of that key. Each time I process a word, I increment a value in counts and I add a new value to words.

After the input processing is done, I can just iterate over counts from top to bottom, looking for the highest count. When I have gone through the entire map, I have the sorted key that generates the most anagrams. Using that key, I query an unordered_multimap for a range of results. It returns two iterators in a pair<T1,T2> object, which I then use to iterate over the result set.

Even if you are familiar with the type system used by the containers and don’t make too many mistakes, just the magnitude of how much you have to type to get this to work is a bit of a downer. And the length of those type definitions doesn’t help make the concepts being used any clearer.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <algorithm>
#include <unordered_map>

int main(int argc, char* argv[])
{
    std::ifstream data( "sowpods.txt" );
    std::unordered_map<std::string,int> counts;
    std::unordered_multimap<std::string,std::string> words;

    std::string s;
    while ( data >> s ) {
        std::string temp = s;
        std::sort(temp.begin(), temp.end() );
        counts[temp]++;
        words.insert( std::make_pair(temp,s) );
    }

    int max_count = -1;
    std::string max_string = "";
    for ( std::unordered_map<std::string,int>::iterator ii = counts.begin();
          ii != counts.end();
          ii++ )
    {
        if ( ii->second > max_count ) {
            max_count = ii->second;
            max_string = ii->first;
        }
    }
    std::cout << "The maximum anagram family has " << max_count << " members:\n";
    std::pair< std::unordered_multimap<std::string,std::string>::iterator,
	       std::unordered_multimap<std::string,std::string>::iterator> range;
    range = words.equal_range( max_string );
    for ( std::unordered_multimap<std::string,std::string>::iterator ii = range.first;
          ii != range.second;
          ii++ )
        std::cout << ii->second << " ";
    std::cout << std::endl;
    return 0;
}

Anagram finder circa TR1

Now let’s look at the two features that make major improvements to this program in C++11.

The auto Type Specification

The hard working committee members who hammered out the standard last year clearly listened to the millions of C++ programmers out there. While they were charting new waters for the language with things like move semantics and rvalue references, they were also making a lot of small changes that simply make the language a lot easier to work with. Maybe even a little more fun. The two things I find at the top of my list are the the use of auto type specifier and the for-range statement.

The auto keyword can be used in a number of different contexts, but in general it means that you can declare variables without having to enter a complete type. This solves some tricky problems for template programming, and it provides a convenience for awkward variable declarations. Most notably, it allows you to replace these two wordy lines of code:

std::pair< std::unordered_multimap<std::string,std::string>::iterator,
    std::unordered_multimap<std::string,std::string>::iterator> range;
range = words.equal_range( max_string );

with this much simpler single line:

auto range = words.equal_range( max_string );

In both cases, the type of range is the same - but by using the auto type specifier, we let the compiler replace all that typing with a bit of simple hand waving.

Bjarne Stroustrup has a good, concise explanation of auto in his C++11 FAQ, I recommend you spend the time to read it.

The Range-based for Statement

When working with standard library containers, one of the most common things we do is iterate over some or all of the container. This generally is done using a for or while loop with an iterator loop variable.

C++11 makes this type of iteration easier with new syntax injected into the for statement that has been around since 1969. The range-based for looks like this:

for ( declaration : expression ) statement

In this new statement, expression can be an initializer list, an array, or an object that implements container semantics. This means that the object returns an iterator-like object from a begin() and end() methods, or via a call to begin() and end() functions in the current or std namespace.

The variable declaration is either a reference or value of the type of variable held in the container, array, or initializer list. The for loop is executed from the beginning of the container to the end, with statement executed once per value returned by the iterator.

Although this is a completely new language feature, I think most C++ programmers will be comfortable with it from the first time they are able - it makes those iterations over containers clean and concise.

Putting It to Use

Although it didn’t really cut down on my code size in a big way, I first made use of the range-based for in the loop that reads in the data from the scrabble dictionary. My new version of the loop is shown here:

for ( const std::string &s : std::istream_iterator<std::string>( data ) )
{
    std::string temp = s;
    std::sort(temp.begin(), temp.end() );
    counts[temp]++;
    words.insert( std::make_pair(temp,s) );
}

The only big improvement here is that I was able to declare my string variable on first use, which is always my preference.

However, looking at this code, you might be wondering how it compiles. After all, the istream_iterator doesn’t have begin() or end() member functions.

That’s correct, and the reason it works is that I added a couple of convenience functions to my program that enable the use of this iterator type with the range-based for:

template<class T>
std::istream_iterator<T> begin(std::istream_iterator<T> &ii_stream)
{
    return ii_stream;
}

template<class T>
std::istream_iterator<T> end(std::istream_iterator<T> &ii_stream)
{
    return std::istream_iterator<T>();
}

I made use of a similar set of template functions to enable the use of the new for statement in my final output statement. I now iterate over the discovered members of the anagram family with two easy-to-read lines:

for ( const auto &map_entry : words.equal_range( ii->first ) )
    std::cout << map_entry.second << " ";

Compare this to the TR1 code that does the same thing, and I think you will see the real value of both auto and range-based for.

Iterating over the values returned from a multimap is a common task, enabled it by these convenient template functions:

template<class ITERATOR>
ITERATOR begin( std::pair<ITERATOR,ITERATOR> &range )
{
    return range.first;
}

template<class ITERATOR>
ITERATOR end( std::pair<ITERATOR,ITERATOR> &range )
{
    return range.second;
}

When I first implemented the functions for my C++11 program, I was halfway expecting to find that this functionality had already been added to the standard library - they really make a big improvement for a small investment. But no, I couldn’t find them, so we will be using our own versions for the time being.

The Final Product

My much improved anagram finder is shown below. In addition to the use of range-based for and auto type declarations, I changed the way I find the maximum element in the container. Now that lambdas are part of the language, there is no excuse for not using the standard library algorithms, and this code gives an illustration of how that works as well.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <algorithm>
#include <unordered_map>

template<class ITERATOR>
ITERATOR begin( std::pair<ITERATOR,ITERATOR> &range )
{
    return range.first;
}

template<class ITERATOR>
ITERATOR end( std::pair<ITERATOR,ITERATOR> &range )
{
    return range.second;
}

template<class T>
std::istream_iterator<T> begin(std::istream_iterator<T> &ii_stream)
{
    return ii_stream;
}

template<class T>
std::istream_iterator<T> end(std::istream_iterator<T> &ii_stream)
{
    return std::istream_iterator<T>();
}

int main(int argc, char* argv[])
{
    std::ifstream data( "sowpods.txt" );
    std::unordered_map<std::string,int> counts;
    std::unordered_multimap<std::string,std::string> words;

    for ( const std::string &s : std::istream_iterator<std::string>( data ) )
    {
        std::string temp = s;
        std::sort(temp.begin(), temp.end() );
        counts[temp]++;
        words.insert( std::make_pair(temp,s) );
    }
    auto ii = std::max_element( counts.begin(), 
                                counts.end(), 
                                [](const std::pair<std::string,int> &v1, 
                                   const std::pair<std::string,int> &v2)
                                { 
                                    return v1.second < v2.second; 
                                } 
                              );
    std::cout << "The maximum anagram family has " << ii->second << " members:\n";
    for ( const auto &map_entry : words.equal_range( ii->first ) )
        std::cout << map_entry.second << " ";
    std::cout << std::endl;
    return 0;
}

Anagram finder in C++11

If I move the four convenience functions into a utility header file, I think you’ll agree that the new version of the code implements my algorithm in a very clean and concise way. The new language improvements make a huge difference in readability and convenience.

Of course, these two features are just one small part of a huge new standard, but for right now, they are the ones I turn to the most. How about you? Let me know!