This is a decorative image showing threads running within a process. It does not add any substantial information to the article. C++11 brings rich support for threading to the language, and one of the features that really works for me is the function template async. This provides a great mechanism for spinning off worker threads and easily collecting their results. In this article, I’ll show you how I used async to introduce threading to C++ beginners in a nice non-threatening fashion.

C/C++ and Threads

Even though the languages lacked standardized support for threading, people have been writing multithreaded C and C++ programs for quite a long time. Libraries like pthreads work pretty well and give you a shot at reasonable portability. But I found that beginners would often stumble when using these as their introduction to multithreading.

The biggest annoyance for beginners would have to be the ability to pass parameters to threads, and to return values from threads. The pthreads library handled this in a conventional way for a C API, requiring all threads to have the same function signature - which relied on void pointers, dynamic allocation, and casting. Plenty of places for the newcomer to stumble.

And returning data from a thread to the caller? You’re on your own. Ad hoc methods are easy enough to implement, but again, it’s just one more place to make a mistake.

C++11 async()

C++11 solves most of these problems quite nicely with its thread support. In particular, for straightforward worker thread applications, the new async function template is perfect.

When using async(), a thread is modeled as a standard C++ function, as is the case with most other libraries. However, you can pass your choice of parameters to the library - full type safety is observed, and you can pass values by copy or reference.

The function prototype for async is a bit of a mess, but in a nutshell, you call async with the following arguments:

  • A launch policy of either std::launch::async, which asks the runtime to create an asynchronous thread, or std::launch::deferred, which indicates you simply want to defer the function call until a later time (lazy evaluation.) This argument is optional - if you omit it your function will use the default policy.
  • The name of the function you are calling.
  • The arguments to be passed to the function. Normally these are passed just as you would when calling the function, but if you wish to pass an argument by reference, you need to wrap it in a call to std::ref().

The actual prototype is shown below:

template< class Function, class... Args >
std::future<typename std::result_of<Function(Args...)>::type>
    async( std::launch policy, Function&& f, Args&&... args );

You’ll note a few instances of C++11 variadic template syntax - the dot-dot-dot (not really an ellipsis) following the word class in the template type parameter list, and following the argument name Args. In both usages, this syntax means 0 or more arguments of this type. (How these variadic arguments are processed is a topic for another post.) The key point is that the function template async is instantiated to accept a typesafe list of arguments that have to match the function you are using to instantiate a thread.

My Teaching Example

To get students a feel for threading using C++11, I asked them to create a simple program called WordSearch that identifies possible Scrabble plays using a simple syntax. A single command line argument is passed to the program - a template expression for searching through the dictionary. I had the students identify specific characters with the actual letter to be played, and the period character for places where any letter could be played. A typical run might look like this:

markn@ubuntu:~ $ ./WordSearch ..x..n
Found 5 matches for ..x..n
sextan
sexton
sixain
taxman
taxmen
markn@ubuntu:~ $ 

The words are identified by brute force, working through all the entries in a copy of the Scrabble dictionary. (One of the nice things about using this pattern is that I can check the program output against grep, since the periods form a regular expression.)

To get started with the program, I asked the students to simply read the words from the Scrabble dictionary into a simple deque<string>:

ifstream f( "sowpods.txt" );
if ( !f ) {
    cerr << "Cannot open sowpods.txt in the current directory\n";
    return -1;
}
string word;
deque<string> backlog;
while ( f >> word )
    backlog.push_back( word );

I then asked them to create a function called find_matches that would locate all the matches in that deque, and return them to the caller in a vector<string>. A typical implementation might look like this:

vector<string> find_matches( string pattern, deque<string> &backlog )
{
    vector<string> results;
    while ( backlog.size() ) {
        string word = backlog.front();
        backlog.pop_front();
        if ( match( pattern, word ) )
            results.push_back( word );
    }
    return results;
}

The implementation of match() should be trivial.

Calling this routine and printing the results is easy enough now:

vector<string> words = find_matches( pattern, backlog );
cerr << "Found " << words.size()
     << " matches for " << pattern
     << endl;
for ( auto s : words )
    cout << s << "\n";

Converting to Multithreading

At this point we have a single-threaded program. Now, thanks to the excellent work of the C++11 committee, we can convert this to a threaded implementation in two easy steps.

First, we wrap the function call in an async wrapper:

   auto f1 = async( launch::async, find_matches, pattern, backlog );

The return type from async() is called a future. It doesn’t contain anything particularly useful immediately after the function call, but after the thread exits, it will provide a simple way to get the value returned by find_matches, the function that is now running as a separate thread. We can retrieve that value with a call to member function get():

    vector<string> words = f1.get();

Since the function was called with std::launch::async, the call to get() will presumably have to join() the thread, which will block until the thread is complete. After the thread has been joined, the future object returns the value returned by find_matches. It really couldn’t be much easier, could it? I’m able to retrieve a complex object as if it were returned directly from a thread, and I don’t need to write any declarations or create any glue code to make it happen.

Using Multiple Threads

This has all been pretty easy so far, but to actually learn something about threading, it helps to surface some of the issues that you will encounter when traveling down this path. To force the issue, I next ask my students to modify the program so that it runs three copies of find_matches at once.

Just calling async three times is not quite enough. I also have to make sure that I pass argument deque<string> backlog by reference. Since all three threads are going to be pulling data out of it simultaneously, I need to make sure that they are working on the same copy of the data.

Because of the way C++ infers the types of arguments, it isn’t quite possible for async to know that backlog is a reference argument. To make that happen, I need to include the <functional> header and wrap reference arguments in a call to ref. The result looks like this:

    auto f1 = async( launch::async, find_matches, pattern, ref(backlog) );
    auto f2 = async( launch::async, find_matches, pattern, ref(backlog) );
    auto f3 = async( launch::async, find_matches, pattern, ref(backlog) );

This fires up three threads more or less at once, and as my students are quick to see, will inevitably result in a crash. Standard library objects like deque are not threadsafe - making them so would probably violate the guiding principle that you don’t pay for things you don’t use.

The simplest solution to this problem is to use the mutex class added to C++11, and lock all access to the deque. A version of find_matches that uses a global mutex m results in code like this:

vector<string> find_matches( string pattern, deque<string> &backlog )
{
    vector<string> results;
    for ( ; ; ) {
        m.lock();
        if ( backlog.size() == 0 ) {
            m.unlock();
            return results;
        }
        string word = backlog.front();
        backlog.pop_front();
        m.unlock();
        if ( match( pattern, word ) )
            results.push_back( word );
    }
}

A slight increase in complexity, but with this lock in place the program runs to completion without trouble. A run with a slightly different pattern produced the following results:

markn@ubuntu~ $ ./WordSearch ..x..l.
Found 9 matches for ..x..l. in thread 1
maxilla
maxwell
mixable
mixedly
mixible
saxauls
sextile
sixfold
sixthly
Found 7 matches for ..x..l. in thread 2
buxomly
hexapla
vexedly
vexilla
vixenly
waxable
waxbill
Found 8 matches for ..x..l. in thread 3
boxball
boxfuls
fixable
fixedly
foxhole
taxable
taxably
textile
markn@ubuntu:~ $ 

Using process substitution under bash you can actually check that WordSearch is returning the same thing that grep sees when processing the data. For example:

./WordSearch ..x..l. 2> /dev/null | sort | diff - <(grep "^..x..l.$" sowpods.txt)

A very simple way to dip your toes in the water, courtesy of C++11.

When To Use async

In his C++11 FAQ, Bjarne Stroustroup has this to say about async:

...don't even think of using async() to launch tasks that do I/O, manipulates mutexes, or in other ways interact with other tasks. The idea behind async() is the same as the idea behind the range-for statement: Provide a simple way to handle the simplest, rather common, case and leave the more complex examples to the fully general mechanism.

I’m not sure I agree with this disclaimer. The async function is in no way a second-class citizen when it comes to launching threads. It simply lowers the barriers to using threads by providing a very nice mechanism for launching them and retrieving their return values.

You can see that in my example I use a mutex to guard access to a data structure. This is not made more dangerous by using async to launch my thread - I use the mutex exactly as I would if I had launched my threads using the thread class.

If your compiler supports async, I say use it!

Does Your Compiler Support Async?

No doubt these new C++11 features are quite cool, but in order to use them, your compiler has to support them. Does yours?

It’s a little hard to read the status of async by browsing the g++ or the Visual C++ 11 beta charts. Regardless of what they may say, this program does build and run properly with g++ 4.7 and the Visual C++ 11 beta.

Updating to g++ 4.7 on your Linux system can be somewhat tricky. Ideally, you would like to find a prebuilt version of the compiler that will work with your distribution. As an example, I was able to easily update my Ubuntu system with the instructions found here. The Visual Studio Community Edition is available to all for free - and installing alongside other versions of Visual C++ seems to work fine.

A little extra work maybe, but really, you should be knee-deep in C++11 by now. This is modern C++.

WordSearch.cpp Source

You can download a copy of sowpods.txt here. The full source for WordSearch.cpp is here:

//
// WordSearch.cpp
//
// When building with g++ 4.7 or later, use this command line:
//
//  g++ -pthread --std=c++0x 
//
#include <fstream>
#include <iostream>
#include <string>
#include <deque>
#include <vector>
#include <future>
#include <mutex>
#include <functional>

using namespace std;

std::mutex m;

inline bool match( const std::string &pattern, std::string word )
{
    if ( pattern.size() != word.size() )
        return false;
    for ( size_t i = 0 ; i < pattern.size() ; i++ ) 
        if ( pattern[ i ] != '.' && pattern[ i ] != word[ i ] )
            return false;
    return true;
}

vector<string> find_matches( string pattern, deque<string> &backlog )
{
    vector<string> results;
    for ( ; ; ) {
        m.lock();
        if ( backlog.size() == 0 ) {
            m.unlock();
            return results;
        }
        string word = backlog.front();
        backlog.pop_front();
        m.unlock();
        if ( match( pattern, word ) )
            results.push_back( word );
    }
}

template<class ASYNC>
void print_results( ASYNC &f, string &pattern, int threadno )
{
    vector<string> words = f.get();
    cerr << "Found " << words.size() 
         << " matches for " << pattern 
         << " in thread " << threadno
         << endl;
    for ( auto s : words )
        cout << s << "\n";
}

int main( int argc, char *argv[] )
{
    if ( argc < 2 ) {
        cerr << "Usage: WordSearch match-expression\n\n"
                "match-expression contains lower case letters and periods.\n"
                "The periods will match any character\n";
        return -1;
    }
    string pattern = argv[ 1 ];
    //
    // Load the words into the deque
    //
    ifstream f( "sowpods.txt" );
    if ( !f ) {
        cerr << "Cannot open sowpods.txt in the current directory\n";
        return -1;
    }
    string word;
    deque<string> backlog;
    while ( f >> word )
        backlog.push_back( word );
    //
    // Now process the words and print the results
    //
    auto f1 = async( launch::async, find_matches, pattern, ref(backlog) );
    auto f2 = async( launch::async, find_matches, pattern, ref(backlog) );
    auto f3 = async( launch::async, find_matches, pattern, ref(backlog) );
    print_results( f1, pattern, 1 );
    print_results( f2, pattern, 2 );
    print_results( f3, pattern, 3 );

    return 0;
}