C++11 brings rich support for threading to the language, and one of the features that really works for me is the function template async. This provides a great mechanism for spinning off worker threads and easily collecting their results. In this article, I’ll show you how I used async to introduce threading to C++ beginners in a nice non-threatening fashion.
C/C++ and Threads
Even though the languages lacked standardized support for threading, people have been writing multithreaded C and C++ programs for quite a long time. Libraries like pthreads work pretty well and give you a shot at reasonable portability. But I found that beginners would often stumble when using these as their introduction to multithreading.
The biggest annoyance for beginners would have to be the ability to pass parameters to threads, and to return values from threads. The pthreads library handled this in a conventional way for a C API, requiring all threads to have the same function signature – which relied on void pointers, dynamic allocation, and casting. Plenty of places for the newcomer to stumble.
And returning data from a thread to the caller? You’re on your own. Ad hoc methods are easy enough to implement, but again, it’s just one more place to make a mistake.
C++11 async()
C++11 solves most of these problems quite nicely with its thread support. In particular, for straightforward worker thread applications, the new async function template is perfect.
When using async(), a thread is modeled as a standard C++ function, as is the case with most other libraries. However, you can pass your choice of parameters to the library – full type safety is observed, and you can pass values by copy or reference.
The function prototype for async is a bit of a mess, but in a nutshell, you call async with the following arguments:
-
A launch policy of either
std::launch::async, which asks the runtime to create an asynchronous thread, or std::launch::deferred, which indicates you simply want to defer the function call until a later time (lazy evaluation.) This argument is optional – if you omit it your function will use the default policy.
The name of the function you are calling.
The arguments to be passed to the function. Normally these are passed just as you would when calling the function, but if you wish to pass an argument by reference, you need to wrap it in a call to std::ref().
The actual prototype is shown below:
template< class Function, class... Args >
std::future<typename std::result_of<Function(Args...)>::type>
async( std::launch policy, Function&& f, Args&&... args );
You’ll note a few instances of C++11 variadic template syntax – the dot-dot-dot (not really an ellipsis) following the word class in the template type parameter list, and following the argument name Args. In both usages, this syntax means 0 or more arguments of this type. (How these variadic arguments are processed is a topic for another post.) The key point is that the function template async is instantiated to accept a typesafe list of arguments that have to match the function you are using to instantiate a thread.
My Teaching Example
To get students a feel for threading using C++11, I asked them to create a simple program called WordSearch that identifies possible Scrabble plays using a simple syntax. A single command line argument is passed to the program – a template expression for searching through the dictionary. I had the students identify specific characters with the actual letter to be played, and the period character for places where any letter could be played. A typical run might look like this:
markn@ubuntu:~ $ ./WordSearch ..x..n Found 5 matches for ..x..n sextan sexton sixain taxman taxmen markn@ubuntu:~ $
The words are identified by brute force, working through all the entries in a copy of the Scrabble dictionary. (One of the nice things about using this pattern is that I can check the program output against grep, since the periods form a regular expression.)
To get started with the program, I asked the students to simply read the words from the scrabble dictionary into a simple deque<string>:
ifstream f( "sowpods.txt" );
if ( !f ) {
cerr << "Cannot open sowpods.txt in the current directory\n";
return -1;
}
string word;
deque<string> backlog;
while ( f >> word )
backlog.push_back( word );
I then asked them to create a function called find_matches that would locate all the matches in that deque, and return them to the caller in a vector<string>. A typical implementation might look like this:
vector<string> find_matches( string pattern, deque<string> &backlog )
{
vector<string> results;
while ( backlog.size() ) {
string word = backlog.front();
backlog.pop_front();
if ( match( pattern, word ) )
results.push_back( word );
}
return results;
}
The implementation of match() should be trivial.
Calling this routine and printing the results is easy enough now:
vector<string> words = find_matches( pattern, backlog );
cerr << "Found " << words.size()
<< " matches for " << pattern
<< endl;
for ( auto s : words )
cout << s << "\n";
Converting to Multithreading
At this point we have a single-threaded program. Now, thanks to the excellent work of the C++11 committee, we can convert this to a threaded implementation in two easy steps.
First, we wrap the function call in an async wrapper:
auto f1 = async( launch::async, find_matches, pattern, backlog );
The return type from async() is called a future. It doesn’t contain anything particularly useful immediately after the function call, but after the thread exits, it will provide a simple way to get the value returned by find_matches, the function that is now running as a separate thread. We can retrieve that value with a call to member function get():
vector<string> words = f1.get();
Since the function was called with std::launch::async, the call to get() will presumably have to join() the thread, which will block until the thread is complete. After the thread has been joined, the future object returns the value returned by find_matches. It really couldn’t be much easier, could it? I’m able to retrieve a complex object as if it were returned directly from a thread, and I don’t need to write any declarations or create any glue code to make it happen.
Using Multiple Threads
This has all been pretty easy so far, but to actually learn something about threading, it helps to surface some of the issues that you will encounter when traveling down this path. To force the issue, I next ask my students to modify the program so that it runs three copies of find_matches at once.
Just calling async three times is not quite enough. I also have to make sure that I pass argument deque<string> backlog by reference. Since all three threads are going to be pulling data out of it simultaneously, I need to make sure that they are working on the same copy of the data.
Because of the way C++ infers the types of arguments, it isn’t quite possible for async to know that backlog is a reference argument. To make that happen, I need to include the <functional> header and wrap reference arguments in a call to ref. The result looks like this:
auto f1 = async( launch::async, find_matches, pattern, ref(backlog) );
auto f2 = async( launch::async, find_matches, pattern, ref(backlog) );
auto f3 = async( launch::async, find_matches, pattern, ref(backlog) );
This fires up three threads more or less at once, and as my students are quick to see, will inevitably result in a crash. Standard library objects like deque are not threadsafe – making them so would probably violate the guiding principle that you don’t pay for things you don’t use.
The simplest solution to this problem is to use the mutex class added to C++11, and lock all access to the deque. A version of find_matches that uses a global mutex m results in code like this:
vector<string> find_matches( string pattern, deque<string> &backlog )
{
vector<string> results;
for ( ; ; ) {
m.lock();
if ( backlog.size() == 0 ) {
m.unlock();
return results;
}
string word = backlog.front();
backlog.pop_front();
m.unlock();
if ( match( pattern, word ) )
results.push_back( word );
}
}
A slight increase in complexity, but with this lock in place the program runs to completion without trouble. A run with a slightly different pattern produced the following results:
markn@ubuntu~ $ ./WordSearch ..x..l. Found 9 matches for ..x..l. in thread 1 maxilla maxwell mixable mixedly mixible saxauls sextile sixfold sixthly Found 7 matches for ..x..l. in thread 2 buxomly hexapla vexedly vexilla vixenly waxable waxbill Found 8 matches for ..x..l. in thread 3 boxball boxfuls fixable fixedly foxhole taxable taxably textile markn@ubuntu:~ $
Using process substitution under bash you can actually check that WordSearch is returning the same thing that grep sees when processing the data. For example:
./WordSearch ..x..l. 2> /dev/null | sort | diff - <(grep "^..x..l.$" sowpods.txt)
A very simple way to dip your toes in the water, courtesy of C++11.
When To Use async
In his C++11 FAQ, Bjarne Stroustroup has this to say about async:
…don’t even think of using async() to launch tasks that do I/O, manipulates mutexes, or in other ways interact with other tasks. The idea behind async() is the same as the idea behind the range-for statement: Provide a simple way to handle the simplest, rather common, case and leave the more complex examples to the fully general mechanism.
I’m not sure I agree with this disclaimer. The async function is in no way a second-class citizen when it comes to launching threads. It simply lowers the barriers to using threads by providing a very nice mechanism for launching them and retrieving their return values.
You can see that in my example I use a mutex to guard access to a data structure. This is not made more dangerous by using async to launch my thread – I use the mutex exactly as I would if I had launched my threads using the thread class.
If your compiler supports async, I say use it!
Does Your Compiler Support Async?
No doubt these new C++11 features are quite cool, but in order to use them, your compiler has to support them. Does yours?
It’s a little hard to read the status of async by browsing the g++ or the Visual C++ 11 beta charts. Regardless of what they may say, this program does build and run properly with g++ 4.7 and the Visual C++ 11 beta.
Updating to g++ 4.7 on your Linux system can be somewhat tricky. Ideally, you would like to find a prebuilt version of the compiler that will work with your distribution. As an example, I was able to easily update my Ubuntu system with the instructions found here. Visual C++ 11 is available to all for free – installing alongside other versions of Visual C++ seems to work fine.
A little extra work maybe, but really, you should be knee-deep in C++11 by now. This is modern C++.
WordSearch.cpp Source
You can download a copy of sowpods.txt here. The full source for WordSearch.cpp is here:
//
// WordSearch.cpp
//
// When building with g++ 4.7 or later, use this command line:
//
// g++ -pthread --std=c++0x
//
#include <fstream>
#include <iostream>
#include <string>
#include <deque>
#include <vector>
#include <future>
#include <mutex>
#include <functional>
using namespace std;
std::mutex m;
inline bool match( const std::string &pattern, std::string word )
{
if ( pattern.size() != word.size() )
return false;
for ( size_t i = 0 ; i < pattern.size() ; i++ )
if ( pattern[ i ] != '.' && pattern[ i ] != word[ i ] )
return false;
return true;
}
vector<string> find_matches( string pattern, deque<string> &backlog )
{
vector<string> results;
for ( ; ; ) {
m.lock();
if ( backlog.size() == 0 ) {
m.unlock();
return results;
}
string word = backlog.front();
backlog.pop_front();
m.unlock();
if ( match( pattern, word ) )
results.push_back( word );
}
}
template<class ASYNC>
void print_results( ASYNC &f, string &pattern, int threadno )
{
vector<string> words = f.get();
cerr << "Found " << words.size()
<< " matches for " << pattern
<< " in thread " << threadno
<< endl;
for ( auto s : words )
cout << s << "\n";
}
int main( int argc, char *argv[] )
{
if ( argc < 2 ) {
cerr << "Usage: WordSearch match-expression\n\n"
"match-expression contains lower case letters and periods.\n"
"The periods will match any character\n";
return -1;
}
string pattern = argv[ 1 ];
//
// Load the words into the deque
//
ifstream f( "sowpods.txt" );
if ( !f ) {
cerr << "Cannot open sowpods.txt in the current directory\n";
return -1;
}
string word;
deque<string> backlog;
while ( f >> word )
backlog.push_back( word );
//
// Now process the words and print the results
//
auto f1 = async( launch::async, find_matches, pattern, ref(backlog) );
auto f2 = async( launch::async, find_matches, pattern, ref(backlog) );
auto f3 = async( launch::async, find_matches, pattern, ref(backlog) );
print_results( f1, pattern, 1 );
print_results( f2, pattern, 2 );
print_results( f3, pattern, 3 );
return 0;
}
8 users commented in " C+11 – Threading Made Easy "
Follow-up comment rss or Leave a TrackbackI also wondered which are the reasons of the Stroustroup’s claim.
Maybe as the async are managed by the runtime environment (can create or not a thread), the waits in locks or in I/O operations can block the main thread and kill parallelism. So the pattern is: Use async only in simple task.
Good article!
Great article. In your class, do you recommend parallel version of containers later on in the course? Or, do you recommend hand-crafted solution of thread objects accessing STL containers with guarded mutex?
@Dat:
No, I don’t really think parallel versions of the containers are a good idea. We could make them work, but the internal locking and unlocking would sap performance in ways that wouldn’t be obvious.
My personal preference is for running all communications between active threads via a thread-safe messaging system. It would be great if that was built into the standard. But not this time.
- Mark
Regarding the FAQ entry – The idea is probably that async should only be used for “smaller” tasks – not for long communicating or active threads. Async only potentially executes the function in a new thread. It might start all functions passed to async in the calling thread. It might decide to queue the functions in one background thread or the other.
So the user has limited control and cannot make any further assumptions. So everything beyond simple tasks is something that should be with other means.
I do not think that this is a bad limitation: it allows the standard library to deal with async effectively and it still is a simple to use feature – as long as that limitation is kept i mind.
I don’t know if this is a good example for teaching concurrency.
FIrst the find_matches function modify the backlog which is not good even for single threaded programs, but it is even worst on multithreaded programs. This function should take as parameter a range and I will use a vector instead of a deque.
Once the function takes a range, it is easy to partition the backlog depending on the number of threads you want to use and give a separated partition to each thread (created by async, why not).
Maybe this is related to the FAQ’s Stroustroup.
i modifyed this program to allow for [numberOfThreads] different researches together. To do this deque must be const, so i simply modified find_matches this way:
and in main created asyncs with different patterns
This is definitely a bad example for teaching concurrency, while I do appreciate the effort for explaining the syntax and usage. While I am no expert at the topic as well, I do believe the algorithm is simple enough NOT to use threads at all, let alone using mutexes in such a way that threads almost always have to wait.
I copied over `/usr/share/dict/words`, duplicated the file 100 times and created an almost 250MB text file with words. I also wrote several programs, one using your approach of a `deque` backlog and continually popping elements from the front, one using a simple `vector` and have the three threads look for words in 3 partitions. Both programmes are compiled with and without concurrency enabled, using lots of `#ifdef`s. The result is startling: your approach, with concurrency enabled is the slowest, taking 75.4 seconds, specifically 49.3s system and 15.5s user, as reported by `time` utility. The other three take roughly the same time, about 5.0 seconds with 4.5s user and 0.5s system. This shows that concurrency is totally unsuited for this specific problem.
It also turns out that it is ill-advised to store all data in memory and then filtering them one by one, because whether the current entry is matched does not depend on past or future, increasing time and memory usage. Here is a 14% percent faster and much shorter solution:
I acknowledge that my tests are performed only once on only my own computer, a dual-core hyperthreaded Intel 2.7GHz machine, compiled with clang++-3.2 with -O4.
@kccqzy:
I have to admit that I find responses like yours puzzling. I mean, when you are trying to understand how concurrency works, do you need to come up with a problem the is optimally solved with multiple threads? How does this matter?
It somehow seems that when you read this article, you think that in the first paragraph I said "I will show you how to use multiple threads to do a word search faster than the single threaded approach." Of course I said no such thing. I said I would use it as a simple way to introduce concepts of concurrency.
If I was teaching first graders the names of colors, and I asked them to make a drawing with a blue dog and a green cow, would that matter? I guess it would to you.
Did Kernighan and Ritchie need to justify Hello World as a problem that was better solved in C than in other languages? No, it was simply an arbitrary illustration of how part of C worked.
To me, the object is to understand how the tools work, how to avoid common problems, how to understand how C++ implements it. The problem domain itself is a completely different topic - much harder and in many ways independent of the language.
But that's just my point of view.
- Mark
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]