<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark Nelson &#187; C/C++</title>
	<atom:link href="http://marknelson.us/category/cc/feed/" rel="self" type="application/rss+xml" />
	<link>http://marknelson.us</link>
	<description>Programming, mostly.</description>
	<lastBuildDate>Fri, 13 Apr 2012 19:25:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>C++11: Range-based for and auto</title>
		<link>http://marknelson.us/2012/04/07/c11-range-based-for-and-auto/</link>
		<comments>http://marknelson.us/2012/04/07/c11-range-based-for-and-auto/#comments</comments>
		<pubDate>Sat, 07 Apr 2012 18:18:02 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Puzzles]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1511</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/04/07/c11-range-based-for-and-auto/' addthis:title='C++11: Range-based for and auto' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>Two really handy features in C++11 are the range-based for statement and the auto type specifier. The former allows you iterate over collections using a much more compact form of expression, and the latter takes some of the headache out of the complex type declarations encountered in the standard library. Both of these features have [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/04/07/c11-range-based-for-and-auto/' addthis:title='C++11: Range-based for and auto' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>Two really handy features in C++11 are the <i>range-based for statement</i> and the <i>auto type specifier</i>. The former allows you iterate over collections using a much more compact form of expression, and the latter takes some of the headache out of the complex type declarations encountered in the standard library. Both of these features have been available in g++ since release 4.6, and are now present in Visual Studio 11, so you can start using them today. (auto typed variables are available in earlier versions of both compilers.) In this post I&#8217;ll give you a description of how these new features works, and show you a concrete example of the positive effects they can have on your programs.<br />
<span id="more-1511"></span></p>
<h4>The value of containers</h4>
<p>It&#8217;s hard to overstate the value of the containers in the C++ standard library. With the addition of the hash-based containers in TR1, I rarely if ever find myself tempted to roll my own, or use a third party library. The flexibility and power of the library created by Alexander Stepanov does everything I need.</p>
<p>Despite the technical merit of the container classes, newcomers are often hesitant about completely embracing them. One of the main reasons has to be the conceptual drag imposed by the use of iterators as the primary means of accessing the objects they contain. It&#8217;s not that there is anything complicated about the concept, but the syntax can be more than just a little annoying. Let me illustrate it with an example.</p>
<h4>Anagramania</h4>
<p>The listing below is a C++ program that reads through the Scrabble dictionary and determines which set of letters generates the most anagrams. I&#8217;m using C++ circa TR1, in which I have access to the unordered associative containers, but I don&#8217;t take any shortcuts to try to simplify the syntax. (The fact that I can write this program in one screen of simple code is a nice testament to the quality of the container library.)</p>
<p>The logic for the program is simple. I use an <code>unordered_multimap</code> called <code>counts</code> to hold the count of all anagram families in the dictionary, with its key being the sorted value of the scrabble word. This means that all words that are anagrams of one another will have the same key. I use an <code>unordered_multimap</code> called <code>words</code> to hold the list of all words that are anagrams of that key. Each time I process a word, I increment a value in <code>counts</code> and I add a new value to <code>words</code>.</p>
<p>After the input processing is done, I can just iterate over <code>counts</code> from top to bottom, looking for the highest count. When I have gone through the entire map, I have the sorted key that generates the most anagrams. Using that key, I query an <code>unordered_multimap</code> for a range of results. It returns two iterators in a <code>pair<T1,T2></code> object, which I then use to iterate over the result set. </p>
<p>Even if you are familiar with the type system used by the containers and don&#8217;t make too many mistakes, just the magnitude of how much you have to type to get this to work is a bit of a downer. And the length of those type definitions doesn&#8217;t help make the concepts being used any clearer.</p>
<pre>
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;string&gt;
#include &lt;iterator&gt;
#include &lt;algorithm&gt;
#include &lt;unordered_map&gt;

int main(int argc, char* argv[])
{
    std::ifstream data( &quot;sowpods.txt&quot; );
    std::unordered_map&lt;std::string,int&gt; counts;
    std::unordered_multimap&lt;std::string,std::string&gt; words;

    std::string s;
    while ( data &gt;&gt; s ) {
        std::string temp = s;
        std::sort(temp.begin(), temp.end() );
        counts[temp]++;
        words.insert( std::make_pair(temp,s) );
    }

    int max_count = -1;
    std::string max_string = &quot;&quot;;
    for ( std::unordered_map&lt;std::string,int&gt;::iterator ii = counts.begin();
          ii != counts.end();
          ii++ )
    {
        if ( ii-&gt;second &gt; max_count ) {
            max_count = ii-&gt;second;
            max_string = ii-&gt;first;
        }
    }
    std::cout &lt;&lt; &quot;The maximum anagram family has &quot; &lt;&lt; max_count &lt;&lt; &quot; members:\n&quot;;
    std::pair&lt; std::unordered_multimap&lt;std::string,std::string&gt;::iterator,
	       std::unordered_multimap&lt;std::string,std::string&gt;::iterator&gt; range;
    range = words.equal_range( max_string );
    for ( std::unordered_multimap&lt;std::string,std::string&gt;::iterator ii = range.first;
          ii != range.second;
          ii++ )
        std::cout &lt;&lt; ii-&gt;second &lt;&lt; &quot; &quot;;
    std::cout &lt;&lt; std::endl;
    return 0;
}
</pre>
<p><center>Anagram finder circa TR1</center><br />
Now let&#8217;s look at the two features that make major improvements to this program in C++11.</p>
<h4>The auto Type Specification</h4>
<p>The hard working committee members who hammered out the standard last year clearly listened to the millions of C++ programmers out there. While they were charting new waters for the language with things like move semantics and rvalue references, they were also making a lot of small changes that simply make the language a lot easier to work with. Maybe even a little more fun. The two things I find at the top of my list are the the use of auto type specifier and the for-range statement.</p>
<p>The auto keyword can be used in a number of different contexts, but in general it means that you can declare variables without having to enter a complete type. This solves some tricky problems for template programming, and it provides a convenience for awkward variable declarations. Most notably, it allows you to replace these two wordy lines of code:</p>
<pre>
    std::pair&lt; std::unordered_multimap&lt;std::string,std::string&gt;::iterator,
	       std::unordered_multimap&lt;std::string,std::string&gt;::iterator&gt; range;
    range = words.equal_range( max_string );
</pre>
<p>with this much simpler single line:</p>
<pre>
    auto range = words.equal_range( max_string );
</pre>
<p>In both cases, the type of <code>range</code> is the same &#8211; but by using the auto type specifier, we let the compiler replace all that typing with a bit of simple hand waving.</p>
<p>Bjarne Stroustrup has a good, concise explanation of <a href="http://www2.research.att.com/~bs/C++0xFAQ.html#auto" class="newpage">auto</a> on his C++11 FAQ, I recommend you spend the time to read it.</p>
<h4>The Range-based for Statement</h4>
<p>When working with standard library containers, one of the most common things we do is iterate over some or all of the container. This generally is done using a for or while loop with an iterator loop variable.</p>
<p>C++11 makes this type of iteration easier with new syntax injected into the <code>for</code> statement that has been around since 1969. The range-based for looks like this:</p>
<pre>
    for ( declaration : expression ) statement
</pre>
<p>In this new statement, <code>expression</code> can be an initializer list, an array, or an object that implements container semantics. This means that the object returns an iterator-like object from a <code>begin()</code> and <code>end()</code> methods, or via a call to <code>begin()</code> and <code>end()</code> functions in the current or std namespace.</p>
<p>The variable declaration is either a reference or value of the type of variable held in the container, array, or initializer list. The for loop is executed from the beginning of the container to the end, with <code>statement</code> executed once per value returned by the iterator.</p>
<p>Although this is a completely new language feature, I think most C++ programmers will be comfortable with it from the first time they are able &#8211; it makes those iterations over containers clean and concise.</p>
<h4>Putting it to Use</h4>
<p>Although it didn&#8217;t really cut down on my code size in a big way, I first made use of the range-based for in the loop that reads in the data from the scrabble dictionary. My new version of the loop is shown here:</p>
<pre>
for ( const std::string &#038;s : std::istream_iterator&lt;std::string&gt;( data ) )
{
    std::string temp = s;
    std::sort(temp.begin(), temp.end() );
    counts[temp]++;
    words.insert( std::make_pair(temp,s) );
}
</pre>
<p>The only big improvement here is that I was able to declare my string variable on first use, which is always my preference.</p>
<p>However, looking at this code, you might be wondering how it compiles. After all, the <code>istream_iterator</code> doesn&#8217;t have <code>begin()</code> or <code>end()</code> member functions.</p>
<p>That&#8217;s correct, and the reason it works is that I added a couple of convenience functions to my program that enable the use of this iterator type with the range-based for:</p>
<pre>
template&lt;class T&gt;
std::istream_iterator&lt;T&gt; begin(std::istream_iterator&lt;T&gt; &amp;ii_stream)
{
    return ii_stream;
}

template&lt;class T&gt;
std::istream_iterator&lt;T&gt; end(std::istream_iterator&lt;T&gt; &amp;ii_stream)
{
    return std::istream_iterator&lt;T&gt;();
}
</pre>
<p>I made use of a similar set of template functions to enable the use of the new for statement in my final output statement. I now iterate over the discovered members of the anagram family with two easy-to-read lines:</p>
<pre>
for ( const auto &#038;map_entry : words.equal_range( ii-&gt;first ) )
    std::cout &lt;&lt; map_entry.second &lt;&lt; &quot; &quot;;
</pre>
<p>Compare this to the TR1 code that does the same thing, and I think you will see the real value of both auto and range-based for.</p>
<p>Iterating over the values returned from a multimap is a common task, enabled it by these convenient template functions:</p>
<pre>
template&lt;class ITERATOR&gt;
ITERATOR begin( std::pair&lt;ITERATOR,ITERATOR&gt; &amp;range )
{
    return range.first;
}

template&lt;class ITERATOR&gt;
ITERATOR end( std::pair&lt;ITERATOR,ITERATOR&gt; &amp;range )
{
    return range.second;
}
</pre>
<p>When I first implemented the functions for my C++11 program, I was halfway expecting to find that this functionality had already been added to the standard library &#8211; they really make a big improvement for a small investment. But no, I couldn&#8217;t find them, so we will be using our own versions for the time being.</p>
<h4>The Final Product</h4>
<p>My much improved anagram finder is shown below. In addition to the use of range-based for and auto type declarations, I changed the way I find the maximum element in the container. Now that lambdas are part of the language, there is no excuse for not using the standard library algorithms, and this code gives an illustration of how that works as well. </p>
<pre>
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;string&gt;
#include &lt;iterator&gt;
#include &lt;algorithm&gt;
#include &lt;unordered_map&gt;

template&lt;class ITERATOR&gt;
ITERATOR begin( std::pair&lt;ITERATOR,ITERATOR&gt; &amp;range )
{
    return range.first;
}

template&lt;class ITERATOR&gt;
ITERATOR end( std::pair&lt;ITERATOR,ITERATOR&gt; &amp;range )
{
    return range.second;
}

template&lt;class T&gt;
std::istream_iterator&lt;T&gt; begin(std::istream_iterator&lt;T&gt; &amp;ii_stream)
{
    return ii_stream;
}

template&lt;class T&gt;
std::istream_iterator&lt;T&gt; end(std::istream_iterator&lt;T&gt; &amp;ii_stream)
{
    return std::istream_iterator&lt;T&gt;();
}

int main(int argc, char* argv[])
{
    std::ifstream data( &quot;sowpods.txt&quot; );
    std::unordered_map&lt;std::string,int&gt; counts;
    std::unordered_multimap&lt;std::string,std::string&gt; words;

    for ( const std::string &amp;s : std::istream_iterator&lt;std::string&gt;( data ) )
    {
        std::string temp = s;
        std::sort(temp.begin(), temp.end() );
        counts[temp]++;
        words.insert( std::make_pair(temp,s) );
    }
    auto ii = std::max_element( counts.begin(),
                                counts.end(),
                                [](const std::pair&lt;std::string,int&gt; &amp;v1,
                                   const std::pair&lt;std::string,int&gt; &amp;v2)
                                {
                                    return v1.second &lt; v2.second;
                                }
                              );
    std::cout &lt;&lt; &quot;The maximum anagram family has &quot; &lt;&lt; ii-&gt;second &lt;&lt; &quot; members:\n&quot;;
    for ( const auto &#038;map_entry : words.equal_range( ii-&gt;first ) )
        std::cout &lt;&lt; map_entry.second &lt;&lt; &quot; &quot;;
    std::cout &lt;&lt; std::endl;
    return 0;
}
</pre>
<p><center>Anagram finder in C++11</center><br />
If I move the four convenience functions into a utility header file, I think you&#8217;ll agree that the new version of the code implements my algorithm in a very clean and concise way. The new language improvements make a huge difference in readability and convenience.</p>
<p>Of course, these two features are just one small part of a huge new standard, but for right now, they are the ones I turn to the most. How about you? Let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/04/07/c11-range-based-for-and-auto/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visual Studio 11 and Modern C++</title>
		<link>http://marknelson.us/2012/03/13/visual-studio-11-and-modern-c/</link>
		<comments>http://marknelson.us/2012/03/13/visual-studio-11-and-modern-c/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 13:13:08 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Standards]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1467</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/03/13/visual-studio-11-and-modern-c/' addthis:title='Visual Studio 11 and Modern C++' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>Despite some harsh words about Visual Studio 11, I&#8217;m finding that it makes my heart go pitter-pat every time I use it. Why? Because this early release is finally incorporating a decent set of long-awaited C++11 features. In this article I&#8217;ll show you how a little thing like a lambda can make a big difference [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/03/13/visual-studio-11-and-modern-c/' addthis:title='Visual Studio 11 and Modern C++' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>Despite some <a href="http://drdobbs.com/windows/232602205" class="newpage">harsh words</a> about Visual Studio 11, I&#8217;m finding that it makes my heart go pitter-pat every time I use it. Why?  Because this early release is finally incorporating a decent set of long-awaited C++11 features. In this article I&#8217;ll show you how a little thing like a lambda can make a big difference in your coding style.<br />
<span id="more-1467"></span></p>
<h4>Microsoft and C++ &#8211; We Have History</h4>
<p>Microsoft has a cyclic relationship with C++. In the early MFC days, the love was there big time &#8211; you had access to most of the system API using C++. However, around the turn of the millennium, Microsoft came under the Rasputin-like influence of Anders Hejlsberg and his beloved offspring, C#. Now it appears that maybe the pendulum is swinging back a bit, and C++ is no longer viewed as an afterthought. Great news.</p>
<p>Although Visual Studio 11 is a developer&#8217;s preview, Microsoft is <a href="http://herbsutter.com/2012/02/29/vc11-beta-on-feb-29/" class="newpage">saying</a> that it is production ready &#8211; you can use this to create programs that are ready for release. In addition to touting a complete implementation of the C++11 standard library, an <a href="http://blogs.msdn.com/b/vcblog/archive/2011/09/12/10209291.aspx" class="newpage">impressive list</a> of language features have been turned on as well. (N.B. the path ahead is still long and arduous.)</p>
<h4>Modern C++</h4>
<p>Before even using Visual Studio C++ 11 to test a single line of code, I really appreciated reading <a href="http://msdn.microsoft.com/en-us/library/hh279654(v=vs.110).aspx" class="newpage">Welcome Back to C++ (Modern C++)</a>, a manifesto that includes the following bullet points:</p>
<blockquote><p>
Modern C++ emphasizes:</p>
<ul>
<li>Stack-based scope instead of heap or static global scope.</li>
<li>Auto type inference instead of explicit type names.</li>
<li>Smart pointers instead of raw pointers.</li>
<li>std::string and std::wstring types instead of raw char[] arrays.</li>
<li>Standard template library (STL) containers—for example, vector, list, and map—instead of raw arrays or custom containers.</li>
<li>STL algorithms instead of manually coded ones.</li>
<li>Exceptions, to report and handle error conditions.</li>
<li>Inline lambda functions instead of small functions implemented separately.</li>
</blockquote>
<p>I feel that all of these changes result in safer code that is easier to read and maintain, without giving up the type-safety and efficiency that we love so much. Fully implementing these features either leans heavily on C++11 or requires it outright.</p>
<h4>A Simple Example Using Naive C++98</h4>
<p>It&#8217;s interesting to watch the evolution of code from C++ 98 to C++11 and see how it affects your code. You&#8217;ll see that the transformation can make it look like you are literally using a new programming language.</p>
<p>In this simple program, I&#8217;m taking a Scrabble rack of tiles and whipping through the Scrabble dictionary to find matches. Since it is a one-time call, I&#8217;m not storing the words, just doing a quick online comparison. In C++ 98, my code might have looked like this:</p>
<pre>
void find_matches( std::string rack, const std::string &amp;filename )
{
    std::sort( rack.begin(), rack.end() );
    std::ifstream sowpods( filename.c_str() );
    std::string word;
    while ( sowpods &gt;&gt; word ) {
        std::string sorted = word;
        std::sort( sorted.begin(), sorted.end() );
        if ( sorted == rack )
            std::cout &lt;&lt; word &lt;&lt; &quot; &quot;;
    }
}

int main(int argc, char* argv[])
{
    find_matches( &quot;etaionsr&quot;, &quot;sowpods.txt&quot; );
    return 0;
}
</pre>
<p>This works properly and I get what looks like correct output:</p>
<pre>
anoestri arsonite notaries notarise rosinate senorita
</pre>
<h4>Classes Good, Templates Better</h4>
<p>As people started to get more comfortable with templates and iterators, algorithms like this were commonly rewritten to take an range of iterators as input &#8211; much as the standard library algorithm functions do. This meant changing the function to a template function, but it did make it a lot more flexible. I could now call the function to operate on data from a file, just as before, but I can also now use any other container, or even an array as input:</p>
<pre>
template&lt;typename ITERATOR&gt;
void find_matches( std::string rack, ITERATOR ii, ITERATOR jj )
{
    std::sort( rack.begin(), rack.end() );
    for ( ; ii != jj ; ii++ ) {
        std::string sorted = *ii;
        std::sort( sorted.begin(), sorted.end() );
        if ( sorted == rack )
            std::cout &lt;&lt; *ii &lt;&lt; &quot; &quot;;
    }
}

int main(int argc, char* argv[])
{
    std::ifstream sowpods( &quot;sowpods.txt&quot; );
    find_matches( &quot;etaionsr&quot;,
                  std::istream_iterator&lt;std::string&gt;( sowpods ),
                  std::istream_iterator&lt;std::string&gt;() );
    return 0;
}
</pre>
<p>More or less the same number of lines of code, but it is now generic.</p>
<p>Of course, just like with OOP, you need to take some care with template programming. Generic programming can&#8217;t be beat when it makes sense, but programmers have a particularly strong susceptibility to <a href="http://en.wikipedia.org/wiki/Pro-innovation_bias" class="newpage">pro-innovation bias</a>. </p>
<h4>Using the Algorithms Library</h4>
<p>Again, prodded by changing styles in the C++ world, my next step is to use a standard library algorithm to do the work. We&#8217;re told over and over that turning to the algorithms library allows you to use code that has been optimized to the n-th degree by the clever library teams. </p>
<p>In order to make this work, I have to call an algorithm with a predicate functor, seen below as class <code>sorted_not_equal</code>. Note also that I can&#8217;t use the logical function for this, which would be <code>copy_if()</code>. Why not? The committee forgot to put it in back in 1998, 2003, and 2005, a mistake that was fortunately remedied in C++11. So I have to use the inverse function, <code>remove_copy_if()</code>, and invert the logical sense of my functor:</p>
<pre>
class sorted_not_equal {
    std::string str;
public :
    sorted_not_equal( const std::string &amp; test )
    {
        str = test;
        sort( str.begin(), str.end() );
    }
    bool operator()( std::string test )
    {
        sort( test.begin(), test.end() );
        return ( str != test );
    }
};

template&lt;typename INPUT_ITERATOR, typename OUTPUT_ITERATOR&gt;
void find_matches( std::string rack,
                   INPUT_ITERATOR ii,
                   INPUT_ITERATOR jj,
                   OUTPUT_ITERATOR kk )
{
    std::sort( rack.begin(), rack.end() );
    std::remove_copy_if( ii, jj, kk, sorted_not_equal( rack ) );
}

int main(int argc, char* argv[])
{
    std::ifstream sowpods( &quot;sowpods.txt&quot; );
    find_matches( &quot;etaionsr&quot;,
                  std::istream_iterator&lt;std::string&gt;( sowpods ),
                  std::istream_iterator&lt;std::string&gt;(),
                  std::ostream_iterator&lt;std::string&gt;(std::cout,  &quot;\n&quot; ) );
    return 0;
}
</pre>
<h4>Functors Not So Hot</h4>
<p>So this new approach is supposed to soup up my code by taking advantage of the algorithms that come with the standard library. But if you look around at the code people have been writing for the past 10 years, you&#8217;ll find that this style is pretty common in textbooks and magazine articles, but no so much in the real world.</p>
<p>Why not? Well, it&#8217;s pretty obvious. The generic algorithms in the library need lots of predicate glue to make them useful, and the work to create these predicates is just a pain. My code is almost twice as long, and the functionality that took two lines of code earlier is now bloated into a complete class definition. It pollutes my namespace, takes up a lot of space, and has to be defined somewhere distant from where it is actually used. Not a win.</p>
<p>This is obviously a problem when you look at the history of Linux, C, and C++. An entire family of technologies and infrastructure was developed with the implicit goal of reducing the number of keystrokes programmers had to enter. (I&#8217;m kidding, but only somewhat.) Functors are a step in the wrong direction.</p>
<h4>Lambda to the Rescue</h4>
<p>So it is with much relief that C++11 delivers lambdas, which allow us to write short sweet predicates exactly where we need them, as shown in this C++11 version of the example:</p>
<pre>
template&lt;typename INPUT_ITERATOR, typename OUTPUT_ITERATOR&gt;
void find_matches( std::string rack,
                   INPUT_ITERATOR ii,
                   INPUT_ITERATOR jj,
                   OUTPUT_ITERATOR kk )
{
    std::sort( rack.begin(), rack.end() );
    std::copy_if( ii, jj, kk,
                  [&amp;rack](std::string str) -&gt;bool
                  {
                      std::sort( str.begin(), str.end() );
                      return rack == str;
                  }
                );
}

int main(int argc, char* argv[])
{
    std::ifstream sowpods( &quot;sowpods.txt&quot; );
    find_matches( &quot;etaionsr&quot;,
                  std::istream_iterator&lt;std::string&gt;( sowpods ),
                  std::istream_iterator&lt;std::string&gt;(),
                  std::ostream_iterator&lt;std::string&gt;(std::cout,  &quot;\n&quot; ) );
	return 0;
}
</pre>
<p>Yes, I now have to get used to a new syntax for writing lambda functions &#8211; I think that was unavoidable. But my lambda function is short, it is quite easy to see exactly what it is doing, and it replaces a gangly and awkward functor class. </p>
<p>Best of all, I use the lambda exactly where I need it &#8211; as the predicate parameter to an algorithm used in the standard library. Locality rules.</p>
<h4>Using Lambdas</h4>
<p>Visual C++ 11 provides a great framework for experimenting with lambdas, as they are supporting the 1.1 definition that was ratified as part of the standard. If you want the gory details, I believe the <a href="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2927.pdf" class="newpage">working group&#8217;s</a> proposal has essentially the same wording that went into the standard. For a detailed tutorial, <a href="http://herbsutter.com/2011/05/20/my-lambdas-talk-nwcpp-is-now-online/" class="newpage">Herb Sutter&#8217;s talk</a> can&#8217;t be beat.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/03/13/visual-studio-11-and-modern-c/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>C++ &#8211; Where&#8217;s the Hate?</title>
		<link>http://marknelson.us/2012/02/27/c-wheres-the-hate/</link>
		<comments>http://marknelson.us/2012/02/27/c-wheres-the-hate/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 05:23:09 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Complaining]]></category>
		<category><![CDATA[Culture]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1453</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/02/27/c-wheres-the-hate/' addthis:title='C++ &#8211; Where&#8217;s the Hate?' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>One thing I&#8217;ve become accustomed to over the years is that there are a lot of C++ haters. They have their reasons &#8211; some good, some bad &#8211; but they are never afraid of sharing their opinions. An article on Slashdot this week touting the release of the C++11 standard should have been a hotbed [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/02/27/c-wheres-the-hate/' addthis:title='C++ &#8211; Where&#8217;s the Hate?' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>One thing I&#8217;ve become accustomed to over the years is that there are a lot of C++ haters. They have their reasons &#8211; some good, some bad &#8211; but they are never afraid of sharing their opinions. An <a href="http://developers.slashdot.org/story/12/02/24/1954225/stroustrup-reveals-whats-new-in-c-11" class="newpage">article on Slashdot</a> this week touting the release of the C++11 standard should have been a hotbed of language trash talk &#8211; instead, it was kind of a low key discussion of both the new language features and some retrospection about the language itself. Where have the haters gone?<br />
<span id="more-1453"></span></p>
<h4>The Epitome of C++ Hate</h4>
<p>There really is no better example of C++ hate than the <a href="http://lwn.net/Articles/249460/" class="newpage">screed</a> arising from the poison keyboard of Linus Torvalds back in 2007. One paragraph sums it up well:</p>
<blockquote><p>
C++ is a horrible language. It&#8217;s made more horrible by the fact that a lot of substandard programmers use it, to the point where it&#8217;s much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.
</p></blockquote>
<p>Linus asserts that C++ is actually sort of a honey-trap, pulling in the substandard programmers and keeping them out of decent C development.</p>
<p>Reading through the most coherent parts of his rant, it is my opinion that much of his ire is directed against a couple of real problems:</p>
<ul>
<li/>The early adopter&#8217;s fondness for Object Oriented Programming (OOP), which leads to objects everywhere &#8211; and particularly in places where they aren&#8217;t needed.
<li/>The more sophisticated user&#8217;s fondness for advanced library and language features found in places like boost.
<p>My personal observations are obviously very subjective, but I feel like the excesses of OOP peaked back in the early part of the last decade. It&#8217;s been a while since I saw a program were a user did integer math using a type called Integer derived from something called Object.</p>
<p>These days I see OOP being heavily used for large libraries, and used internally in projects where it makes sense. Often &#8220;making sense&#8221; is as simple as using objects for RAII, which is really not OOP, just good resource management. Things like RAII are really just better C &#8211; providing some guarantees on resource usage that can&#8217;t be made when programming in straight C.</p>
<p>And best of all, the boost language features that Linus didn&#8217;t like in 2007 made their way into TR1, and finally into C++11. No longer will you need to worry about rebuilding boost for your compiler every time there is a new release of either product. From this point on, it should just work.</p>
<h4>The Slashdot Crowd</h4>
<p>So instead of seeing a bunch of comments in the vein of Linus&#8217;s, the recent Slashdot article contained virtually no blanket dismissals of C++ as something a sane programmer would use. The negative comments were much more constrained, maybe aimed at a pet peeve instead of the language as a whole:</p>
<blockquote><p>
printf() isn&#8217;t typesafe, but it&#8217;s a fuckton more readable than all that cout formatting stuff. Also, the fact that it&#8217;s not typesafe isn&#8217;t really an issue if you don&#8217;t suck &#8211; trivial unit testing will pretty much show any problems immediately. Besides, gcc/g++ is nice enough to warn you about egregious ones now.
</p></blockquote>
<blockquote><p>
I don&#8217;t think you&#8217;ll see a lot of people flaming C++, there just aren&#8217;t that many people that care one way or the other anymore.</p>
<p>I think some of the new features look nice but mainstream use has been shifting away from C++ for a while and I&#8217;m not sure I see these new features drawing many back&#8230;
</p></blockquote>
<blockquote><p>
Are the features useful? Yes, but they&#8217;re taking a complex language and slapping on yet more functionality. Some new C++ code syntax doesn&#8217;t even *look* like C++ anymore it&#8217;s so different. Not everyone is a C++ guru and the language is bad enough supporting so many different paths to the same implementation outcomes. This is just going to make staffing, testing, training, and code review that much worse trying to juggle yet another barrel-full of C++ &#8220;improvements&#8221;.
</p></blockquote>
<h4>Why the Dropoff?</h4>
<p>I think there are two reasons for the drop in C++ antagonism. </p>
<p>The first is the increasing diversity of languages in use today. Over the past 10 years, the most popular programming languages like Java, C, C++, and VB have all experienced drops in adoption. In 2002, the top four languages might have claimed almost 70% of the <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html" class="newpage">Tiobe Language Index</a>. Today the top four represent less than 50%. Language selection has been democratized as the bottom tier of the top 10 gains representation in large markets where it makes sense.</p>
<p>In other words, it was a lot easier to focus your wrath on C++ when it represented 20% of the projects out there instead of the 8% of today.</p>
<p>The second is inevitable rationalization of the earlier adopters. C++ programmers in 1998 saw objects behind every tree. In 2003 they were building insanely complicated template metaprogramming frameworks.</p>
<p>All this crazy stuff settles down over the course of a few years as the C++ mainstream settles into paradigms for things that actually work sensibly. Templates for container classes? Sure. Templates for mathematical computation? Uh, no thanks.</p>
<h4>Final Analysis</h4>
<p>There is a lot of nice stuff in C++11, and nothing that is going to take years to figure out. A lot of it will lead to more readable code, and that&#8217;s always good.</p>
<p>The pace of change is pretty comfortable for me. These language features will be rolling out over the next few years, and as they drop into our favorite compilers, we will pick them up quickly.</p>
<p>I, for one, welcome our new ISO overlords.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/02/27/c-wheres-the-hate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streambuf Iterators Are a Big Help</title>
		<link>http://marknelson.us/2012/02/05/streambuf-iterators-are-a-big-help/</link>
		<comments>http://marknelson.us/2012/02/05/streambuf-iterators-are-a-big-help/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 00:34:45 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1441</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/02/05/streambuf-iterators-are-a-big-help/' addthis:title='Streambuf Iterators Are a Big Help' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>A few weeks back I was looking at the choice of whether to use iterators or streaming operations for I/O on my data compression code. I was bemoaning the fact that the C++ iterators that perform stream I/O use the insertion and extraction operators, making them unsuitable for binary data compression. It looks like I [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2012/02/05/streambuf-iterators-are-a-big-help/' addthis:title='Streambuf Iterators Are a Big Help' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>A few weeks back I was <a href="http://marknelson.us/2011/12/24/streams-or-iterators/" class="newpage">looking at the choice</a> of whether to use iterators or streaming operations for I/O on my data compression code. I was bemoaning the fact that the C++ iterators that perform stream I/O  use the insertion and extraction operators, making them unsuitable for binary data compression.</p>
<p>It looks like I spoke too fast.<br />
<span id="more-1441"></span></p>
<h4>The Problem</h4>
<p>The canonical way to use iterators to go over an iostream is to use <code>istream_iterator</code> or its partner, <code>ostream_iterator</code>. In this short sample, I create a binary file with values 0 to 256, then try to read back the data using this iterator type:</p>
<pre>
#include <iostream>
#include <fstream>
#include <iterator>
using namespace std;

int main(int argc, char* argv[])
{
    ofstream temp_out("temp.dat", std::ios_base::binary );
    for ( int i = 0 ; i < 256 ; i++ )
        temp_out.put( (char) i );
    temp_out.close();
    ifstream temp_in("temp.dat", std::ios_base::binary );
    char last = -1;
    int count = 0;
    istream_iterator<char> ii( temp_in );
    while ( ii != std::istream_iterator<char>() ) {
        char c = *ii++;
        count++;
        if ( c != char(last+1) )
            cout << "Error on character number: " << (int) last << endl;
        last = c;
    }
    cout << "Count: " << count << endl;
    return 0;
}
</pre>
<p>Running this program gives the following output:</p>
<pre>
Error on character number: 8
Error on character number: 31
Count: 250
</pre>
<p>Due to the fact that the extraction operator uses whitespace as a delimiter, we get errors trying to read tab, line feed, vertical tab, form feed, carriage return, and space - which means we simply don't see six characters in the input file. This is not too bad when parsing ascii text, but when trying to do compression on binary data, it just won't work.</p>
<p>I figured this was the end of it until reader Fred Jardon chimed in with:</p>
<blockquote><p>
Isn’t istreambuf_iterator the iterator you’re looking for ?</p>
<p><a href="http://cplusplus.com/reference/std/iterator/istreambuf_iterator/" class="newpage">http://cplusplus.com/reference/std/iterator/istreambuf_iterator/</a></p>
<p>It directly reads from the inner streambuf without using the extraction operator.</p>
<p>The opposite iterator exists: ostreambuf_iterator
</p></blockquote>
<p>Well, Fred is quite right, and it only serves to show my lack of depth when it comes to iostreams. The two classes, <code>istreambuf_iterator</code> and <code>ostreambuf_iterator</code> read directly from the underlying buffer, and don't use the extraction operator, which means they don't have the whitespace issues seen above. Changing just two lines fixes the problem:</p>
<pre>
    istreambuf_iterator&lt;char&gt; jj(temp_in.rdbuf());
    while ( ii != std::istreambuf_iterator&lt;char&gt;() ) {
</pre>
<p>Running that I get the pristine output I long for:</p>
<pre>
Count: 256
</pre>
<p>I'd like to tell you the ramifications of diving down into the object and working on the buffer, but I'm afraid this implementation details still elude me. There's a lot more going on in iostreams than in good old <code>&lt;stdio.h&gt;</code>, and half of what I learned about it was wiped out in the standardization process. So be it.</p>
<p>In any case, I think the proper use of <code>istreambuf_iterator<char></code> tips the scale in favor of iterators for me, which reverses my previous thinking. </p>
<p>And many thanks to Fred for straightening out the mess.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/02/05/streambuf-iterators-are-a-big-help/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streams or Iterators?</title>
		<link>http://marknelson.us/2011/12/24/streams-or-iterators/</link>
		<comments>http://marknelson.us/2011/12/24/streams-or-iterators/#comments</comments>
		<pubDate>Sat, 24 Dec 2011 18:21:11 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Data Compression]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1393</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/12/24/streams-or-iterators/' addthis:title='Streams or Iterators?' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>When I updated my LZW reference code to use the latest C++ features, I abstracted my input and output functions using templates. Data was read and written using the iostreams paradigm, which requires simple classes that implement just a few functions. Would I have been better off using the iterator paradigm instead? The C++ algorithms [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/12/24/streams-or-iterators/' addthis:title='Streams or Iterators?' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>When I updated my <a href="http://marknelson.us/2011/11/08/lzw-revisited/" class="newpage">LZW</a> reference code to use the latest C++ features, I abstracted my input and output functions using templates. Data was read and written using the iostreams paradigm, which requires simple classes that implement just a few functions. Would I have been better off using the iterator paradigm instead? The C++ algorithms library favors that method of processing data, and it can be both elegant and powerful. Which of the two paradigms is the right one for data compression?<br />
<span id="more-1393"></span></p>
<h4>The Conflict</h4>
<p>General purpose data compression routines tend to be used on binary streams of data, either from files or in-memory objects. So what is the best general paradigm for input and output when compressing data? </p>
<p>You might analyze this problem by imagining that you need to write a binary copy routine. </p>
<pre>
template&lt;class INPUT_ITERATOR, class OUTPUT_ITERATOR&gt;
void bcopy( INPUT_ITERATOR input, INPUT_ITERATOR eof, OUTPUT_ITERATOR output )
{
    while ( input != eof )
        *output++ = *input++;
}
</pre>
<p>This routine is particularly nice when you are performing a simple copy using pointers to memory &#8211; the generated code should be really efficient.</p>
<p>However, the iterator paradigm doesn&#8217;t work quite as well when you want to perform a binary copy of data in a file. I can make use of iterators that almost do the job:</p>
<pre>
 std::ifstream in( &quot;input.txt&quot;, std::ios_base::binary );
 std::ofstream out(&quot;output.txt&quot;, std::ios_base::binary );
 bcopy( std::istream_iterator(in),
        std::istream_iterator(),
	std::ostream_iterator(out) );
</pre>
<p>But the bad news is that both <code>istream_iterator</code> and <code>ostream_iterator</code> use the insertion and extraction operators, which are really meant for whitespace-delimited textual data, not binary data. The copy routine shown here will not make a binary byte-for-byte copy of the input file.</p>
<p>So when using files, the stream approach seems to be the way to go:</p>
<pre>
template&lt;class INPUT_STREAM, class OUTPUT_STREAM&gt;
void bcopy( INPUT_STREAM in, OUTPUT_STREAM out )
{
    char c;
    while ( in.get(c) )
        out.put(c);
}
</pre>
<p>If my files have been opened using the <code>iostream</code> classes, you can use this binary copy function without having to write any glue code &#8211; they already support the <code>get</code> and <code>put</code> methods, so this works right out of the box.</p>
<h4>My Choice</h4>
<p>If I&#8217;ve made up my mind that my data compression routine is going to use one of these two paradigms, it means I am going to have to write some glue code. If I choose the iterator-based approach, I need the equivalent of <code>istream_iterator</code> and <code>ostream_iterator</code> for binary files &#8211; and these aren&#8217;t in the standard library. If I choose the stream-based approach, I need efficient <code>put()</code> and <code>get()</code> members for blocks of memory. In some cases <code>basic_stringstream</code> might do the job, but not in all cases.</p>
<p>After dithering around with various solutions, I tentatively opted for the stream paradigm. I found the implementation for various sources of data to be fairly simple, and the interface is easy to understand. I don&#8217;t know if it&#8217;s the perfect choice, and I&#8217;ll keep experimenting, but for now it works for me. My abstraction of the LZW code still needs a lot of work, so it&#8217;s always possible I could rethink this at a later date.</p>
<p>I&#8217;d like to hear your thoughts &#8211; is there an obvious right answer to this question?</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/12/24/streams-or-iterators/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Automating Putty</title>
		<link>http://marknelson.us/2011/12/10/automating-putty/</link>
		<comments>http://marknelson.us/2011/12/10/automating-putty/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 12:11:15 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Magazine Articles]]></category>
		<category><![CDATA[Serial Communications]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=776</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/12/10/automating-putty/' addthis:title='Automating Putty' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>Windows users who need a command line connection to another system via telnet or SSH are big fans of PuTTY. It&#8217;s free, it has every feature you need, and it&#8217;s reliable. One thing many people would like to do is use PuTTY as a component in their program. Apparently this comes up so often enough [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/12/10/automating-putty/' addthis:title='Automating Putty' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>Windows users who need a command line connection to another system via telnet or SSH are big fans of <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/" class="newpage">PuTTY</a>. It's free, it has every feature you need, and it's reliable. </p>
<p>One thing many people would like to do is use PuTTY as a component in their program. Apparently this comes up so often enough that there is a <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html#faq-embedding" class="newpage">FAQ entry</a> dedicated to the topic. Alas, PuTTY does not have any sort of automation interface, so this goal has always been out of reach.</p>
<p>In this article I will show you how to work around this minor shortcoming. Creating a version of PuTTY that can be driven from a Windows program turns out to be an easy task. I'll demonstrate this with a small C++ program that shows exactly how to get this versatile program to do your bidding. My solution works for C++, but the changes I make should work well with any Windows software that can properly process a few messages.<br />
<span id="more-776"></span></p>
<h4>Putting Together the Project</h4>
<p>I'm using Visual Studio 2010 to build both my program and the modified version of Putty. I created the basic outline as follows:</p>
<ol>
<li/>Use the <em>File|New|Project</em> menu item to bring up the list of available project wizards.
<li/>Select <em>MFC project</em>, and enter a project name (I used the uninspired name <em>PuttyDriver</em>.)
<li/>I don't want the default MFC settings, so in the MFC App Wizard, select the <em>Next</em> button.
<li/>On the <em>Application Type</em> page of the wizard, change the Application Type to <em>Dialog Based</em>.
<li/>The project is ready to go at this point, you can click the <em>Finish</em> button and then build your initial project.
</ol>
<p>My driver program is only going to do one thing: direct putty to connect to the host of my choice, then log in using canned credentials. The resulting UI is shown below, and I am going to leave the very minor details of creating it up to the reader.</p>
<table border="0" width="100%">
<tr>
<td><center><img src="/attachments/2011/putty/Figure01.png"></center></td>
</tr>
<tr>
<td><center>The driver program - a simple dialog-based MFC app</center></td>
</tr>
</table>
<h4>Adding Putty to the Project</h4>
<p>The next step in this process is to add the Putty components to the project. I downloaded version 0.61 of the PuTTY source from the <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html" class="newpage">download page</a> and extracted it to a separate folder. I then used Visual Studio's <em>File|Add|Existing Project</em> to add the compatible project file, <code>Putty.dsp</code>, found in <code>/Windows/MSVC/Putty</code>. Visual Studio has to convert this project to a version 10 project file, but it should do so with no problems.</p>
<p>I then right-clicked on the Putty project in Solution Explorer and renamed it to <em>AutoPutty</em>. Since this version of PuTTY will have some slightly different behavior, I don't want to confuse the executable I am creating with the real thing.</p>
<p>From Project|Project Dependencies, I set the PuttyDriver project to depend on AutoPutty - this insures that both projects get built when I build the entire solution.</p>
<p>My final change to the project is to modify the output directory for both Debug and Release versions of AutoPutty. I set the project to build the executable in the root directory of my PuttyDriver project - this will make it easy to find the executable when I need to launch it. I had to make this change in two places: <em>Properties|Configuration Properties|General|Output Directory</em> and <em>Properties|Linker|Outuput File</em>.</p>
<p>When you finally build the project, you'll find that current version of Microsoft's C++ compiler complain quite a bit about the use of functions like <code>strcpy</code> - Microsoft would like you to use safer replacement functions. You may choose to turn those errors off by defining <code>_CRT_SECURE_NO_WARNINGS</code> in the project file. While you are there, you should define <code>SECURITY_WIN32</code> as well - it is required by Windows header <code>sspi.h</code>.</p>
<p>After a successful build you should find a copy of <code>AutoPutty.exe</code> in the root directory of your project, and it should run on your system and behave just like PuTTY.</p>
<h4>Launching AutoPutty</h4>
<p>If I'm going to have a PuTTY component in my PuttyDriver program, one of the first things I need is to be able to start and stop AutoPutty. So my first step in this project is to create the code that launches the program from PuttyDriver. The code below is inserted into the handler for the Start button:</p>
<pre>
UpdateData( true );
char path[MAX_PATH];
GetCurrentDirectory(MAX_PATH, path);
if ( path[ strlen(path) - 1 ] != '\\' )
    strcat_s( path, MAX_PATH, &quot;\\&quot; );
strcat_s( path, MAX_PATH, &quot;AutoPutty.exe -ssh &quot; );
strcat_s( path, MAX_PATH, m_HostName.GetBuffer() );
PROCESS_INFORMATION pi;
ZeroMemory(&amp;pi, sizeof(pi) );
STARTUPINFO si;
ZeroMemory(&amp;si, sizeof(si) );
si.cb = sizeof(si);
if ( CreateProcess( NULL, path, NULL, NULL, NULL, NULL, NULL, NULL, &amp;si, &amp;pi ) )
{
    Sleep( 1000 );
    BringWindowToTop();
}
</pre>
<p>This code assumes that <code>AutoPutty.exe</code> is in the current directory, and launches it with a command line telling it to connect to the host named in the dialog using <a href="http://www.ietf.org/rfc/rfc4251.txt" class="newpage">ssh</a>. Assuming that you have the project set up properly, pushing the start button should now start an independent copy of AutoPutty, which will behave identically to classic PuTTY.</p>
<h4>Taking Ownership of AutoPutty</h4>
<p>At this point I can successfully launch AutoPutty, but I can't really start calling this an integrated part of my main program, PuttyDriver. All I have done is set up a launcher for a separate executable. </p>
<p>The next step in the integration process is to establish PuttyDriver as the owner of AutoPutty's main window. Most Windows programmers are familiar with the traditional parent/child relationship between windows. That relationship is well understood, but I can't use it here - it doesn't work for two top level windows.</p>
<p>Setting PuttyDriver to be the <em>owner</em> (as opposed to the parent) of AutoPutty has the following effects, as explained <a href="http://msdn.microsoft.com/en-us/library/ms632599(v=VS.85).aspx#owned_windows" class="newpage">here</a> by Microsoft:</p>
<ul>
<li/>The owned window will always be above its owner in the z-order.
<li/>The system automatically destroys the owned window when the owner is destroyed.
<li/>The owned window is hidden when the owner is minimized.
</ul>
<p>The most straightforward way to set ownership of the window is to pass the owner's handle in the call to <code>CreateWindow()</code>, which means I will now make my first modifications to the PuTTY source code. </p>
<p>There are a number of ways to pass the owner handle to AutoPutty for use in the call to <code>CreateWindow()</code>, with the most obvious being to pass it on the command line. In the interest of minimizing changes to the existing PuTTY code base, I elected to pass it by creating an environment variable that holds the owner window handle. Since a child process inherits the parent's environment, this is a no-fuss way to get the data to AutoPutty.</p>
<p>I added the following code to the end of <code>InitDialog()</code> in PuttyDriver:</p>
<pre>
CString hwnd_text;
hwnd_text.Format( &quot;%d&quot;, m_hWnd );
SetEnvironmentVariable(&quot;PUTTY_OWNER&quot;, hwnd_text );
</pre>
<p>This sets the environment variable for AutoPutty to find when it gets launched.</p>
<p>Now I come to the point where I am actually making changes to the PuTTY code. Fortunately, all of the changes needed for this program are confined to two files: <code>terminal.c</code> and <code>windows/window.c</code>. My first change is to <code>window.c</code>. This file contains the WndProc for the PuTTY window, and thus most of the rendering and control code for the GUI.</p>
<p>In order to establish the Owner/Owned relationship, I need to modify the code that calls <code>CreateWindow()</code>. I hoisted the function call into a block, added code to get the owner window handle, and inserted the handle into the call to <code>CreateWindow()</code>:</p>
<pre>
{
    HWND owner_hwnd = 0;
    char buffer[ 132 ];
    if ( GetEnvironmentVariable( &quot;PUTTY_OWNER&quot;, buffer, 132 ) )
        sscanf( buffer, &quot;%d&quot;, &amp;owner_hwnd );
    if ( owner_hwnd == 0 )
        MessageBox( NULL,
                    &quot;AutoPutty did not find the handle for the &quot;
                    &quot;owner window, this is not going to work&quot;,
                    &quot;Fail&quot;,
                    MB_OK );
    hwnd = CreateWindowEx(exwinmode, appname, appname,
                          winmode, CW_USEDEFAULT, CW_USEDEFAULT,
                          guess_width, guess_height,
                          owner_hwnd, NULL, inst, NULL);
}
</pre>
<p>At this point I've only modified one small block of code in the PuTTY source, but I'm well on my way to having it behave more like a component of PuttyDriver and less like an independent program. The ownership status means that the two programs only appear once on the taskbar, and will only appear once when you are pressing ALT-TAB to select a new active process. And they only produce a single entry in the Applications Tab of Task Manager.</p>
<h4>The Communications Link</h4>
<p>In order to achieve the automation that I am seeking, I also need to have two way communications between AutoPutty and the driver program. Since this is Windows, a natural choice for communications is to use native Windows messages. In order to do this, both programs need the Window handle of their opposite number.</p>
<p>I've already solved half of that problem through the ownership relationship established when I created the main window for AutoPutty. Now that it has set PuttyDriver as its owner window, I can get this window handle any place in the program through a simple function call:</p>
<pre>
HWND parent = GetWindow(hwnd, GW_OWNER);
</pre>
<p>But the reverse is not true - PuttyDriver does not know have a copy of the window handle for AutoPutty. </p>
<p>To remedy this situation, I added code to <code>window.c</code> that notifies its owner when it s created, and when it is destroyed. First I add this statement immediately after the call to <code>CreateWindow()</code>:</p>
<pre>
if ( owner_hwnd )
   PostMessage( owner_hwnd, WM_APP, 0, (LPARAM) hwnd );
</pre>
<p>This tells PuttyDriver that the window is created, and gives it the handle to use for communications.</p>
<p>I also need to know when the window is closed, and I have to add that code two places in <code>window.c</code> - because Putty can be shut down two different ways. </p>
<p>Normally AutoPutty will shut down in response to a windows message. When this happens, I can count on a <code>WM_CLOSE</code> message being sent to the Windows Procedure. I add this code the existing handler for <code>WM_CLOSE</code>:</p>
<pre>
if (!cfg.warn_on_close || session_closed ||
    MessageBox(hwnd,
               &quot;Are you sure you want to close this session?&quot;,
               str, MB_ICONWARNING | MB_OKCANCEL | MB_DEFBUTTON1)
    == IDOK) {
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent )
        SendMessage( parent, WM_APP, 0, 0 );
    DestroyWindow(hwnd);
}
</pre>
<p>This lets PuttyDriver know that the window has been destroyed.</p>
<p>The original PuTTY code has an alternative method of shutdown. When it receives one of several possible network events, such as a telnet connection being broken, it calls <code>PostQuitMessage()</code>. When a program shuts down this way, it doesn't issue messages to destroys its windows - it relies on the O/S to destroy the windows when the process exists. As a result, I have to make a change in <code>WinMain()</code>, the main window procedure for PuTTY. This procedure extracts the messages sent to it using <code>PeekMessage</code>, and I add some code to handle the processing when a <code>WM_QUIT</code> message is sent:</p>
<pre>
if (msg.message == WM_QUIT) {
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent )
        SendMessage( parent, WM_APP, 0, 0 );
    goto finished;	       /* two-level break */
}
</pre>
<h4>Handling the AutoPutty Lifecycle Events</h4>
<p>To keep track of the state of AutoPutty, I have to add a handler for <code>WM_APP</code> to PuttyDriver. It does two things when handling the incoming<code> WM_APP</code> event.</p>
<p>First, then handler stores the handle of the AutoPutty window - or sets the value to 0 when the window has been destroyed.</p>
<p>Second, it either enables or disables the button used to start up AutoPutty. Since this program can only manage one window at a time, I don't want to allow any inadvertent button pushes:</p>
<pre>
afx_msg LRESULT CPuttyDriverDlg::OnWmApp(WPARAM wParam, LPARAM lParam)
{
    m_PuttyWindow = (HWND) lParam;
    m_StartButton.EnableWindow( !m_PuttyWindow );
    return 0;
}
</pre>
<p>One final piece of bookkeeping is to make sure that the AutoPutty window is shut down when PuttyDriver shuts down. (The Windows documentation claims this happens automatically to owned windows, but it doesn't seem to be the case.)</p>
<pre>
void CPuttyDriverDlg::OnDestroy()
{
    CDialogEx::OnDestroy();
    if ( m_PuttyWindow )
        ::SendMessage( m_PuttyWindow, WM_CLOSE, 0, 0 );
}
</pre>
<h4>Monitoring Input Traffic</h4>
<p>Now that I have control over the lifetime of my AutoPutty window, it's time to take the next step in automation. My driver program needs to watch all the data coming in from the remote end so that it can take action on various types of input.</p>
<p>Depending on how you set up your connection, PuTTY can receive input data from a serial port, a Telnet connection, or an SSH connection. Fortunately the Windows version of PuTTY uses a standard handle-based interface to all three types of connections. The routine <code>term_data()</code> in <code>terminal.c</code> is called as data arrives, regardless of the source.</p>
<p>Since we are using the Windows API to communicate between processes, it makes sense to use the <code>WM_COPYDATA</code> message to send data to the parent program as it arrives. <code>WM_COPYDATA</code> is a good choice, as it takes care of marshalling the data between the two processes, which can add some complication to other solutions. The modified routine is shown below:</p>
<pre>
int term_data(Terminal *term, int is_stderr, const char *data, int len)
{
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent ) {
        COPYDATASTRUCT cd;
        cd.dwData = (ULONG_PTR) 0xDEADBEEF;
        cd.cbData = len;
        cd.lpData = (PVOID) data;
        SendMessage( parent, WM_COPYDATA, (WPARAM) hwnd, (LPARAM) &amp;cd );
    }
</pre>
<h4>Receiving the Data</h4>
<p>To receive this messages in PuttyDriver, I simply create a handler for <code>WM_COPYDATA</code> and start grabbing the data as it arrives. One important thing to note is that because AutoPutty has to use <code>SendMessage()</code> to send the data to its parent, it has to wait for PuttyDriver to finish processing the data until it can continue. This dictates a certain style of behavior on my part.</p>
<p>There are quite a few ways to skin this cat, and I'm keeping it very simple here. I'm using a <code>deque&lt;char&gt;</code> container to hold the last 64 characters I've received. After each <code>WM_COPYDATA</code> message I received, I check to see if the current output snapshot ends in one of my trigger messages. If it does, I post the message number to myself for later processing, then return so that AutoPutty can continue its work.</p>
<p>The code I'm using here is doing something fairly simple: automating the login process by using the credentials that I've entered into the dialog box. That means the two strings I'm looking for are the login and password prompts. The resulting code is shown here:</p>
<pre>
BOOL CPuttyDriverDlg::OnCopyData(CWnd* pWnd, COPYDATASTRUCT* pCopyDataStruct)
{
    char *p = (char *) pCopyDataStruct-&gt;lpData;
    int len = pCopyDataStruct-&gt;cbData;
    if ( len &gt;= 64 ) {
        p += len - 64;
        len = 64;
        m_Snapshot.clear();
    }
    while ( len-- )
        m_Snapshot.push_front(*p++);
    m_Snapshot.resize(64);
    static const char *needles[2] = { &quot;login as: &quot;, &quot;password: &quot; };
    for ( int i = 0 ; i &lt; 2 ; i++ ) {
        int len = strlen( needles[i] );
        int j;
        for ( j = 0 ; j &lt; len ; j++ ) {
            if ( needles[i][j] != m_Snapshot[len-1-j] )
                break;
        }
        if ( j == len )
            PostMessage( WM_APP+1, i, 0 );
    }
    return TRUE;
}
</pre>
<p>There is plenty of room for improvement in this routine, much of it depending on what type of automation you are going to be using in your program. Some obvious items would include the ability to add and remove triggers as the program progresses, and regular expression matching for triggers. </p>
<h4>Driving PuTTY</h4>
<p>This login program is now complete save for one detail: I need a way to send my responses back to AutoPutty. </p>
<p>The first part of this is pretty obvious - I just need to read the data from the dialog box and post it to AutoPutty with my useful <code>WM_COPYDATA</code> command. This happens in my <code>WM_APP+1</code> handler:</p>
<pre>
afx_msg LRESULT CPuttyDriverDlg::OnWmAppPlusOne(WPARAM wParam, LPARAM lParam)
{
    UpdateData(TRUE);
    CString msg;
    switch ( wParam ) {
    case 0 :
        msg = this-&gt;m_UserId + '\r'; break;
    case 1:
        msg = this-&gt;m_Password + '\r'; break;
    }
    if ( this-&gt;m_PuttyWindow ) {
        COPYDATASTRUCT cd;
        cd.dwData = (ULONG_PTR) 0xF00DFACE;
        cd.cbData = msg.GetLength();
        cd.lpData = (PVOID) (const char *) msg;
        ::SendMessage( this-&gt;m_PuttyWindow,
                       WM_COPYDATA,
                       (WPARAM) this-&gt;m_hWnd,
                       (LPARAM) &amp;cd );
    }
    return 0;
}
</pre>
<p>Sending this data to AutoPutty is fine, but right now the program doesn't do anything with that message. The final piece of work is to add a <code>WM_COPYDATA</code> handler to window.c. </p>
<p>Simply grabbing the data is easy enough - the data structure that accompanies the message contains a pointer to the data and a value indicating its length. However, I have two problems I have to solve before the data is actually sent out to the to whatever device AutoPutty is connected to.</p>
<p>First, I have to take into account the fact that PuTTY was written to use wide characters. My driver program was built using MultiByte characters, so we have a mismatch. This means I have to do a conversion of the data from one domain to the other. This is a two step process - I call <code>MultiByteToWideChar()</code> once to determine how much space I need, then I allocate a buffer and call it again.</p>
<p>The second thing I need to do is determine what to do with the data once I've converted it. PuTTY takes all terminal input and eventually passes through a function called <code>luni_send()</code>. Calling this function directly from the Windows procedure seems to work just fine. </p>
<p>The <code>WM_COPYDATA</code> handler I created looks like this:</p>
<pre>
case WM_COPYDATA :
{
    COPYDATASTRUCT *cd = (COPYDATASTRUCT *) lParam;
    int wsize = MultiByteToWideChar( CP_ACP,
                                     MB_PRECOMPOSED,
                                     (LPCSTR) cd-&gt;lpData,
                                     cd-&gt;dwData,
                                     NULL,
                                     0 );
    wchar_t *buf = (wchar_t *) calloc( wsize+1, sizeof(wchar_t) );
    MultiByteToWideChar( CP_ACP,
                         MB_PRECOMPOSED,
                         (LPCSTR) cd-&gt;lpData,
                         cd-&gt;dwData,
                         buf,
                         wsize + 1 );
    if (term-&gt;ldisc)
        luni_send(term-&gt;ldisc, buf, wsize, 0);
    free( buf );
}
</pre>
<p>At this point I have a working program - it connects to my designated <a href="http://www.webhostingsearch.com/" class="newpage">host</a>, and sends the username and password of my choice to the host, connecting me to the system.</p>
<p>I should add a note of caution here. Automating logins is a tempting time saver, but in general this is a really bad idea. Any time you hard code credentials into a program, you open the door to all sorts of new attacks on your system.</p>
<p>In my demo program, the user has to enter a name and password, so nothing is hardcoded, but even this adds security holes to a system. I encourage you to think of this as a demonstration only.</p>
<p><center></p>
<table border="0">
<tr>
<td><center><iframe width="500" height="281" src="http://www.youtube.com/embed/3O4t9KzpKbo?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></center></td>
</tr>
<tr>
<td><center>Demo of the program in action<br/>For a better view, go to full screen and select 720p</td>
</tr>
</table>
<p></center></p>
<h4>Source Code</h4>
<p>I've included the complete source code for PuttyDriver, the MFC project that controls AutoPutty. It was built with Visual Studio 2010, so you may have a little work to do if you backport it to earlier versions. My use of language features and classes should be compatible with much earlier versions - this is all very simple code.</p>
<p>Because PuTTY is always changing, I am not redistributing a snapshot of the version I used. Instead, I'm including before and after copies of the two source files I modified: <code>window.c</code> and <code>terminal.c</code>. If you build with Putty 0.61, you should be able to drop these two files right on top of the files included with the distribution and be on your way. With later versions of PuTTY you will have to perform an intelligent merge of the changes, which I hope will be a fairly effortless process.</p>
<h4>Downloads</h4>
<ul>
<li><a href="/attachments/2011/putty/PuttyDriver.zip">PuttyDriver.zip</a>. The PuttyDriver source and project. You will need to add the PuTTY project to this solution as described in the article.
<li><a href="/attachments/2011/putty/putty.zip">putty.zip</a>. This contains the two PuTTY source files modified for this project. Both the original 0.61 source and my modified source are supplied. Executables are supplied as well, which may or may not work on your system.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/12/10/automating-putty/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>VC++ 10 Hash Table Performance Problems</title>
		<link>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/</link>
		<comments>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 14:05:45 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Complaining]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1347</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/' addthis:title='VC++ 10 Hash Table Performance Problems' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>Microsoft's implementation of <code>unordered_map</code> in Visual Studio 10 has performance issues so severe it may be unusable in your projects.]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/' addthis:title='VC++ 10 Hash Table Performance Problems' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>Microsoft has never been a slacker in the C++ department - they've always worked hard to provide a top-notch, compliant product. Visual Studio 10 supports their current incarnation, and for the most part it is up to their usual standards. It's a great development environment, and I am a dedicated user, but I have to give Microsoft a demerit in one area: their C++11 hash containers have some serious performance problems - so much that the Debug versions of the containers may well be unusable in your application.<br />
<span id="more-1347"></span></p>
<h4>Background</h4>
<p>I first noticed the problem with <code>unordered_map</code> when I was working on the the code for my updated <a href="http://marknelson.us/2011/11/08/lzw-revisited/" class="newpage">LZW article</a>. I found that when running in the debugger, my program would hang after exiting the compression routine. A little debugging showed that the destructor for my hash table was taking a long time to run. And by a long time, I mean it was approaching an <i>hour</i>!.</p>
<p>This seemed pretty crazy. Destroying a hash table wouldn't seem to be a complicated task. I decided to see if I could come up with a reasonable benchmark. I wrote a test program that does a simple word frequency count. As a starter data set, I used the first one million white space delimited words in the 2010 CIA factbook, as published by <a href="http://www.gutenberg.org/ebooks/35830.txt.utf8" class="newpage">Project Gutenberg</a>. This data set yields 74,208 unique tokens.</p>
<p>I wrote a simple test rig that I used to test the word count program using four different containers:</p>
<ul>
<li/><code>unordered_map</code> indexed by <code>std::string</code>
<li/><code>unordered_map</code> indexed by <code>std::string *</code>
<li/><code>map</code> indexed by <code>std::string</code>
<li/><code>map</code> indexed by <code>std::string *</code>
</ul>
<p>The reason for testing with <code>std::string *</code> was to reduce the cost of copying strings into the hash table as it was filled, and then to reduce the cost of destroying those strings when the table was destroyed.</p>
<p>I ran tests against <code>map</code> expecting to see a pretty big difference in performance. Because <code>map</code> is normally implemented using a balanced binary tree structure, it has O(log(N)) performance on insertions. A sparsely populated hash table can have O(1) performance. By using fairly large data sets, I expected to see a big difference between the two.</p>
<p>I tried to eliminate a few obvious sources for error in my test function - and I used a template function so that I could use the same code on all the different container types:</p>
<pre>
template&lt;class CONTAINER, class DATA&gt;
void test( const DATA &amp;data, const char *test_name )
{
  std::cout &lt;&lt; &quot;Testing container: &quot; &lt;&lt; test_name &lt;&lt; std::endl;

#ifdef _DEBUG
  const int passes = 2;
#else
  const int passes = 10;
#endif
  double fill_times = 0;
  double delete_times = 0;
  size_t entries;
  for ( int i = 0 ; i &lt; passes ; i++ ) {
    CONTAINER *container = new CONTAINER();
    std::cout &lt;&lt; &quot;Filling... &quot; &lt;&lt; std::flush;
    clock_t t0 = clock();
    for ( auto ii = data.begin() ; ii != data.end() ; ii++ )
      (*container)[*ii]++;
    double span = double(clock() - t0)/CLOCKS_PER_SEC;
    fill_times += span;
    entries = container-&gt;size();
    std::cout &lt;&lt; &quot; &quot; &lt;&lt; span &lt;&lt; &quot; Deleting... &quot; &lt;&lt; std::flush;
    t0 = clock();
    delete container;
    span = double(clock() - t0)/CLOCKS_PER_SEC;
    delete_times += span;
    std::cout &lt;&lt; span &lt;&lt; &quot; &quot; &lt;&lt; std::endl;
  }
  std::cout &lt;&lt; &quot;Entries: &quot; &lt;&lt; entries
            &lt;&lt; &quot;, Fill time: &quot; &lt;&lt; (fill_times/passes)
            &lt;&lt; &quot;, Delete time: &quot; &lt;&lt; (delete_times/passes)
            &lt;&lt; std::endl;
}
</pre>
<p>I didn't go overboard when it came to instrumenting this problem, I just used the timing functions built into the C++ library. On my Windows and Linux test systems, the values of CLOCKS_PER_SEC are both high enough that I'm not worried about granularity issues.</p>
<h4>The First Results</h4>
<p>I ran my test program in Visual C++ Release mode, using all the standard settings for a console application. For purposes of comparison, I ran the same program using g++ 4.6.1 on the same computer, booted up under Linux. For the set of 1,000,000 tokens, the results are shown below:</p>
<table border="1" cellpadding="5">
<thead>
<tr>
<th>Task</th>
<th>VC++ 10 Release</th>
<th>g++ 4.6.1 -O3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fill <code>unordered_map&lt;string&gt;</code></td>
<td>0.41s</td>
<td>.11s</td>
</tr>
<tr>
<td>Fill <code>unordered_map&lt;string const *&gt;</code></td>
<td>0.39s</td>
<td>0.14s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string&gt;</code></td>
<td>3.17s</td>
<td>0.01s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string const *&gt;</code></td>
<td>3.24s</td>
<td>0.004s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string&gt;</code></td>
<td>0.83s</td>
<td>.53s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string const *&gt;</code></td>
<td>0.88s</td>
<td>0.66s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string&gt;</code></td>
<td>.14s</td>
<td>0.01s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string const *&gt;</code></td>
<td>.07s</td>
<td>0.002s</td>
</tr>
</tbody>
</table>
<p>There are a few interesting points to take away from these tests:</p>
<ul>
<li/>Microsoft's compiler is taking an exceptionally long time to destroy hashed containers - one order of magnitude greater than it took to create it, and two orders of magnitude greater than it takes g++ to do the same task.
<li/>It doesn't look like constructing and destroying the strings is a big factor. Both compilers have roughly the same performance with both <code>std::string</code> and <code>std::string *</code>. Microsoft's behavior is counterintuitive, as it takes longer to construct and destroy containers using the pointer.
<li/>The GNU compiler appears to be able to run through this exercise notably faster.
</ul>
<p>The time it takes to destroy the table is a concern - having a C++ program hang for over 3 seconds to destroy a modestly large data structure is a serious concern - particularly when the same task completes in a few milliseconds with g++.</p>
<h4>The Pathological Results</h4>
<p>These concerns are nothing compared to what I see when running in debug mode. Setting my Visual Studio project to Debug mode, then running the same test, yields the results shown here:</p>
<table border="1" cellpadding="5">
<thead>
<tr>
<th>Task</th>
<th>VC++ 10 Debug</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fill <code>unordered_map&lt;string&gt;</code></td>
<td>17.41s</td>
</tr>
<tr>
<td>Fill <code>unordered_map&lt;string const *&gt;</code></td>
<td>17.08s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string&gt;</code></td>
<td>505.36s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string const *&gt;</code></td>
<td>505.99s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string&gt;</code></td>
<td>13.29s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string const *&gt;</code></td>
<td>13.15s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string&gt;</code></td>
<td>0.94s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string const *&gt;</code></td>
<td>0.18s</td>
</tr>
</tbody>
</table>
<p>Those numbers are hard to believe. Destroying a hash table takes one millisecond when using g++. In VC++ 10, it takes almost ten minutes!</p>
<p>Worse, we suddenly see that hashed containers are <i>slower</i> than the containers built on red-black trees. Again, this just doesn't make sense.</p>
<p>The big problem with these numbers is that it means the debug mode of the compiler is effectively unusable for a lot of tasks. Regardless of how much testing it does, when it is this slow, it is just not useful.</p>
<h4>A Workaround</h4>
<p>I didn't invest the time to try debugging Microsoft's library, so I don't really know where the time is being spent. I did try a few things to speed things up, and I found one technique that helps a lot. Before including any Microsoft header files, try entering this single line in your source:</p>
<pre>
#define ITERATOR_DEBUG_LEVEL 0
</pre>
<p>With this definition in place, the delete times return to ball park of the times seen when running in release mode. Of course, you give up some debugging. I believe that an explanation of what this macro does might be found <a href="http://blogs.msdn.com/b/vcblog/archive/2011/04/05/10150198.aspx" class="newpage">here</a>.</p>
<p>In the final analysis, I think Microsoft has some serious work to to do here. The performance of their hashed containers, and to some lesser extent, the pre-C++11 associative containers, needs some serious examination. If the library is going to run this much slower than the competition, I need a good explanation why.</p>
<h4>Source</h4>
<p><a href="/attachments/2011/msvc_hash/HashTest.cpp">HashTest.cpp</a></p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>LZW Revisited</title>
		<link>http://marknelson.us/2011/11/08/lzw-revisited/</link>
		<comments>http://marknelson.us/2011/11/08/lzw-revisited/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 15:21:41 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Data Compression]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1056</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/11/08/lzw-revisited/' addthis:title='LZW Revisited' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>In this updated look at LZW, I will first give a description of how LZW works, then describe the core C++ code that I use to implement the algorithm. I'll then walk you through the use of the algorithm with a few varieties of I/O. Finally, I'll show you some benchmarks and go over the history of this well-known compression algorithm.]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/11/08/lzw-revisited/' addthis:title='LZW Revisited' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>One of the first articles I wrote for Dr. Dobb's Journal, <a href="http://marknelson.us/1989/10/01/lzw-data-compression/" class="newpage">LZW Data Compression</a>, turned out to be very popular, and still generates a fair amount of traffic and email over twenty years later.</p>
<p>One of the reasons for its popularity seems to be that LZW compression is a popular homework assignment for CS students around the world. And that audience sometimes found the article to be bit of a struggle. My code was modeled on the UNIX <a href="http://en.wikipedia.org/wiki/Compress" class="newpage">compress program</a>, which was written in terse C for maximum efficiency. And sometimes optimization comes at the expense of comprehension.</p>
<p>By using C++ data structures I can model the algorithm in a much more straightforward way - the language doesn't get in the way of a clear implementation. And after 20 years of answering puzzled queries I think I can improve on the overall explanation of just how LZW works. </p>
<p>In this updated look at LZW, I will first give a description of how LZW works, then describe the core C++ code that I use to implement the algorithm. I'll then walk you through the use of the algorithm with a few varieties of I/O. Finally, I'll show you some benchmarks.<br />
<span id="more-1056"></span><br />
I'm hoping that this version of the article will be good enough to last for another 20 years.</p>
<h4>LZW Basics</h4>
<p>LZW compression works by reading a sequence of <em>symbols</em>, grouping the symbols into <em>strings</em>, and converting the strings into <em>codes</em>. Because the codes take up less space than the strings they replace, we get compression.</p>
<p>My implementation of LZW uses the C++ <code>char</code> as its symbol type, the C++ <code>std::string</code> as its string type, and <code>unsigned int</code> as its code type.  The tables of codes and strings are implemented using <code>unordered_map</code>, the C++ library's hash table data structure. By using the native types and standard library data structures the representation in the program is straightforward and easy to follow.</p>
<h4>Encoding/Decoding</h4>
<p>Rather than jumping directly into a full implementation, I'm going to work my way up to LZW one step at a time.</p>
<p>The first step is getting a clear understanding of how the encoding and decoding process works. As I said earlier, LZW compression converts strings of symbols into integer codes. Decompression converts codes back into strings, returning the same text that we started with.</p>
<p>LZW is a greedy algorithm - it tries to find the longest possible string that it has a code for, then outputs that string. The code below is not quite LZW, but it shows you the basic idea of how a greedy encoder can work:</p>
<pre>
void encode( input_stream in, output_stream out )
{
  //
  // This hash table contains a list of codes, indexed
  // by the string that corresponds to the code.
  //
  std::unordered_map&lt;std::string,unsigned int&gt; codes;
  //
  // There is presumably some code here that initializes
  // the dictionary with a set of codes based on whatever
  // algorithm we are implementing.
  //
  ...initialize the dictionary
  //
  // With codes in the dictionary, encoding is
  // now ready to begin.
  //
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  out &lt;&lt; codes[current_string];
}
</pre>
<p>The greedy encoder reads characters in from the uncompressed stream, and appends them one by one to the variable <code>current_string</code>. Each time it lengthens the string by one character, it checks to see if it still has a valid code for that string in the dictionary.</p>
<p>This continues until we eventually add a character that forms a string that isn't in the dictionary. So we then erase the last character from that string, and issue the code for the resulting string - the string from the previous pass through the loop. </p>
<p>The value of <code>current_string</code> is then initialized with the character that broke the camel's back, and the algorithm continues in the loop, building new strings until it runs out of input characters. At that point it outputs the last remaining code and exits.</p>
<p>As an example of how this would work, imagine I have the input stream <code>ACABCA</code>, and my code dictionary looks like this:<br />
<center></p>
<table border="1">
<tr>
<td>String</td>
<td>Code</td>
</tr>
<tr>
<td>A</td>
<td>1</td>
</tr>
<tr>
<td>B</td>
<td>2</td>
</tr>
<tr>
<td>C</td>
<td>3</td>
</tr>
<tr>
<td>AB</td>
<td>4</td>
</tr>
<tr>
<td>ABC</td>
<td>5</td>
</tr>
</table>
<p>A sample dictionary<br />
</center><br />
If you follow the algorithm above, you'll see that the code output has to be <code>1 3 5 1</code>. If this wasn't a greedy algorithm, <code>1 3 4 3 1</code> would have been another valid output.</p>
<p>Decoding the stream in a system like this is very straightforward:</p>
<pre>
void decode( input_stream in, output_stream out )
{
  std::unordered_map&lt;unsigned int,std::string&gt; strings;
  //
  // Initialize the code table with the same set of codes and strings
  // that the encoder used for your algorithm.
  //
  ...initialize the dictionary
  //
  // With codes in the dictionary, decoding is now
  // ready to begin.
  //
  unsigned int code;
  while ( in &gt;&gt; code )
    out &lt;&lt; strings[code];
}
</pre>
<p>Remember, the decoder shown above is just a hypothetical sample - we're still working our way up to the full LZW decoder.</p>
<h4>The LZW Encoder</h4>
<p>The encoder shown above works okay, but there is one missing ingredient: management of the code dictionary. If you think about it, you'll see that we only achieve reasonable compression when we are able to build up longer strings and find them in the dictionary. Building a useful dictionary is referred to in the data compression world as <em>modeling</em>.</p>
<p>But our management of the dictionary is constrained by an important requirement: the encoder and decoder both have to be working with the same copy of the dictionary. If they have different dictionaries, the encoder might send a string that the decoder can't resolve.</p>
<p>Some data compression algorithms solve this problem by using a predefined dictionary that both the encoder and the decoder know in advance. But LZW builds a dictionary on the fly, using an <em>adaptive</em> method that ensures both the encoder and decoder are in sync.</p>
<p>LZW manages this in an effective and provably correct fashion. First, both the encoder and decoder initialize the dictionary with all possible single digit strings. For the compressor, that looks like this:</p>
<pre>
for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,(char)i)] = i;
</pre>
<p>This insures that we can encode all possible streams. No matter what, we can always break a stream down into single digits and encode these, knowing that the decoder has the same strings in its dictionary with values 0-255.</p>
<p>Then comes the key component of the LZW algorithm. If you go back to the greedy encoding loop above, you'll see that I keep adding input symbols to a string until I find a string that isn't in the dictionary. This string has the characteristic of being composed of a string that currently exists in the dictionary, with one additional character.</p>
<p>LZW then takes that new string and adds it to the dictionary, creating a new code. The strings are added to the table with code values that increment by one with each new entry.</p>
<p>The resulting code is just a slightly modified version of the encoder that I listed above. It still only outputs codes for values that are in the dictionary, but now the dictionary is being updated with a new string every time an existing code is sent:</p>
<pre>
void compress( input_stream in, output_stream out )
{
  std::unordered_map&lt;std::string,unsigned int&gt; codes;
  for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,(char)i)] = i;
  unsigned int next_code = 257;
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      codes[ current_string ] = next_code++;
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  out &lt;&lt; codes[current_string];
}
</pre>
<p>The code above constitutes a more or less complete LZW encoder. I've only made a couple of additions to the previous encoder:</p>
<ul>
<li/>The initialization of codes 0-255 with all possible single character strings.
<li/>The insertion of the newly discovered string into the string table, generating a new code.
</ul>
<p>(One item of note in this code: you might wonder why <code>next_code</code> is initialized to 257, when 256 is the first free code. This is because I reserve code 256 for an EOF marker. More on this in a later section.)</p>
<p>Just to make sure this all adds up, I'll walk through the steps the encoder takes as it processes a string from a simple two letter alphabet: <code>ABBABBBABBA</code>. There are a lot of steps shown below, but working through the process in detail is a great way to be sure you understand it:<br />
<center><br />
<table border="1">
<tr>
<th>Input<br/>Symbol</th>
<th>Action(s)</th>
<th>New<br/>Code
<th>Output<br/>Code</th>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'A'<br/>'A' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is not in the dictionary, add it with code 257<br/>output the code for 'A' - 65<br/>set current_string to 'B'</td>
<td valign="top">257 (AB)</td>
<td valign="top">65 (A)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'BB'<br/>'BB' is not in the dictionary, add it with code 258<br/>output the code for 'B' - 66<br/>set current_string to 'B'</td>
<td valign="top">258 (BB)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BA'<br/>'BA' is not in the dictionary - add it with code 259<br/>output the code for 'B' - 66<br/>set current_string to 'A'</td>
<td valign="top">259 (BA)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'ABB'<br/>'ABB' is not in the dictionary - add it with code 260<br/>output the code for 'AB' - 257<br/>set current_string to 'B'</td>
<td valign="top">260 (ABB)</td>
<td valign="top">257 (AB)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'BB'<br/>'BB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BBA'<br/>'BBA' is not in the dictionary - add it with code 261<br/>output the code for 'BB' - 258<br/>set current_string to 'A'</td>
<td valign="top">261 (BBA)</td>
<td valign="top">258 (BB)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'ABB'<br/>'ABB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'ABBA'<br/>'ABBA' is not in the dictionary - add it with code 262<br/>output the code for 'ABB' - 260<br/>set current_string to 'A'</td>
<td valign="top">262 (ABBA)</td>
<td valign="top">260 (ABB)</td>
</tr>
<tr>
<td valign="top"><center>EOF</center></td>
<td>end of the input stream - exit loop<br/>current string is 'A'<br/>output the code for 'A' - 65</td>
<td>&nbsp;</td>
<td>65 (A)</td>
</tr>
</table>
<p></center><br />
After processing string <code>ABBABBBABBA</code>, the output codes are <code> 65,66,66,257,258,260,65</code>. The dictionary at this point is:<br />
<center></p>
<table border="1">
<tr>
<td>String</td>
<td>Code</td>
</tr>
<tr>
<td>AB</td>
<td>257</td>
</tr>
<tr>
<td>BB</td>
<td>258</td>
</tr>
<tr>
<td>BA</td>
<td>259</td>
</tr>
<tr>
<td>ABB</td>
<td>260</td>
</tr>
<tr>
<td>BBA</td>
<td>261</td>
</tr>
<tr>
<td>ABBA</td>
<td>262</td>
</tr>
</table>
<p>The dictionary generated for <code>ABBABBBABBA</code><br/>(Entries 0-255 not shown for brevity)<br />
</center><br />
Looking at the above table, you can see a few interesting things happening. First, every time the algorithm outputs a code, it also adds a new code to the dictionary.</p>
<p>More importantly, as the dictionary grows, it starts to hold longer and longer strings. And the longer the string, the the more compression we can get. If the algorithm starts emitting integer codes for strings of length 10 or more, there is no doubt that we are going to get good compression.</p>
<p>As an example of how this works on real data, here are some entries from the dictionary created when compressing <em>Alice's Adventures in Wonderland</em>:</p>
<pre>
34830 : 'even\n'
34831 : '\nwith t'
34832 : 'the dr'
34833 : 'ream '
34834 : ' of Wo'
34835 : 'onderl'
34836 : 'land'
34837 : 'd of l'
34838 : 'long ag'
34839 : 'go:'
</pre>
<p>These strings have an average length of almost six characters. If we are writing the integer codes to a file using 16 bit binary integers, these entries offer the possibility of 3:1 compression.</p>
<p>The word <em>adaptive</em> is used to describe a compression algorithm that adapts to the type of text it is processing. LZW does an excellent job of this. If a string is seen repeatedly in the text, it will show up in longer and longer entries in the dictionary. If a string is seen rarely, it will not be the foundation for a large batch of longer strings, and thus won't waste space in the dictionary.</p>
<h4>The LZW Decoder</h4>
<p>The change made to the basic encoder to accommodate the LZW algorithm was really very simple. One small batch of code that initializes the dictionary, and another few lines of code to add every new unseen string to the dictionary.</p>
<p>As you might suspect, the changes to the decoder will be fairly simple as well. The first change is that the dictionary must be initialized with the same 256 single-symbol strings that the encoder uses.</p>
<p>Once the decoder starts running, each time it reads in a code, it must add a new value to the dictionary. And what is that value? The entire content of the previously decoded string, plus the first letter of the currently decoded string. This is exactly what the encoder does to create a new string, and the decoder must following the same steps:</p>
<pre>
void decompress( input_stream in, output_stream out )
{
  std::unordered_map&lt;unsigned int,std::string&gt; strings;
  for ( int unsigned i = 0 ; i &lt; 256 ; i++ )
    strings[i] = std::string(1,i);
  std::string previous_string;
  unsigned int code;
  unsigned int next_code = 257;
  while ( in &gt;&gt; code ) {
    out &lt;&lt; strings[code];
    if ( previous_string.size() )
      strings[next_code++] = previous_string + strings[code][0];
    previous_string = strings[code];
  }
}
</pre>
<p>I won't do a walk-through of the the decoder - you should be able to take the codes output from the encoder, shown above, and run them through the decoder to see that the output stream is what we expect.</p>
<p>The important thing is to understand the logic behind the decoder. When the encoder encounters a string that isn't in the dictionary, it breaks it into two pieces: a root string and an appended character. It outputs the code for the root string, and adds the root string + appended character to the dictionary. It then starts building a new string that starts with the appended character.</p>
<p>So every time the decoder uses a code to extract a string from the dictionary, it knows that the first character in that string was the appended character of the string just added to the dictionary by the encoder. And the root of the string added to the dictionary? That was the <em>previously</em> decoded string. This line of code implements that logic:</p>
<pre>
    strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>It adds a new string to the dictionary, composed of the previously seen string, and the first character of the current string. Thus, the decoder is adding strings to the dictionary just one step behind the encoder.</p>
<p>You might note one curious point in the decoder. Instead of always adding the string to the dictionary, it is only done conditionally:</p>
<pre>
if ( previous_string.size() )
  strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>The only time that <code>previous_string.size()</code> is 0 is on the very first pass through the loop. And on the first pass through the loop, we don't have a previous string yet, so the decoder can't build a new dictionary entry. Again, the decoder is always one step behind the encoder, which is a key point in the next section, which puts the final touches on the algorithm.</p>
<h4>The Catch</h4>
<p>So far the LZW algorithm we've seen seems very elegant - that's a characteristic we associate with algorithms that can be expressed in just a few lines of code.</p>
<p>Unfortunately, there is one small catch in this perceived elegance - the algorithm as I've shown it to you has a bug.</p>
<p>The bug in the algorithm relates to the fact that the encoder is always one step ahead of the decoder. When the encoder adds a string with code <em>N</em> to the table, it sends enough information to the decoder to allow the decoder to figure out the value of the string denoted by code <em>N-1</em>. The decoder won't know what the value of the string corresponding to code <em>N</em> is until it receives code <em>N+1</em>.</p>
<p>This makes sense if you recall the key line of code from the decoder. It calculates the value of the string encoded by <em>N-1</em> by looking at the string it received on the previous iteration, plus the first character of the current string. And that current string is the one that was sent after encoding <em>N</em>.</p>
<p>So how can this get us in trouble? The encoder is always one entry ahead of the decoder - it has entry <em>N</em> in its dictionary, and the decoder has entry <em>N-1</em>. So if the encoder ever sends code <em>N</em>, the decoder will look in its table and come up empty-handed, unable to do its job of decoding.</p>
<p>A simple example will show you how this can happen. Let's look at the state of the encoder after it has sent the first five symbols in a stream: <code>ABABA</code>:</p>
<p><center><br />
<table border="1">
<tr>
<th>Input<br/>Symbol</th>
<th>Action(s)</th>
<th>New<br/>Code
<th>Output<br/>Code</th>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'A'<br/>'A' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is not in the dictionary, add it with code 257<br/>output the code for 'A' - 65<br/>set current_string to 'B'</td>
<td valign="top">257 (AB)</td>
<td valign="top">65 (A)</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BA'<br/>'BA' is not in the dictionary, add it with code 258<br/>output the code for 'B' - 66<br/>set current_string to 'A'</td>
<td valign="top">258 (BA)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'ABA'<br/>'ABA' is not in the dictionary, add it with code 259<br/>output the code for 'AB' - 257<br/>set current_string to 'A'</td>
<td valign="top">259 (ABA)</td>
<td valign="top">257 (AB)</td>
</tr>
</table>
<p></center><br />
Now we are set for trouble. The encoder has symbol 259 in its dictionary, while the decoder has only gotten to 258. If the encoder were to send a code of 259 for its next output, the decoder would not be able to find it in its dictionary. Can this happen?</p>
<p>Yes, if the next two characters in the stream are <code>BA</code>, the next code output by the encoder will be 259, and the decoder will be lost.</p>
<p>In general, this can happen when a dictionary entry exists that consists of a string plus a character, and the encoder encounters the sequence <code>string+character+string+character+string</code>. In the example above, the value of <em>string</em> is <code>A</code>, and the value of <em>character</em> is <code>B</code>. After the encoder counters <code>AB</code>, it has <code>string+character</code> in the dictionary, so if the following sequence is <code>ABABA</code>, we will emit code <em>N</em>.</p>
<p>Whether this is likely to happen or not is not too important, what is important is that it most definitely can happen, and the decoder has to be aware of it. And it will happen repeatedly in the pathological case: a stream that consists of a single symbol, repeated on end.</p>
<p>The good news is that the problem is easily solved. When the decoder receives a code, and finds that this code is not present in its dictionary, it knows right away that the code must be the one that it will add next to its decoder. And because this only happens when we are encoding the sequence discussed above, the decoder knows that instead of using this value for that code:</p>
<pre>
    strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>it can instead use this value:</p>
<pre>
    strings[ code ] = previous_string + previous_string[0];
</pre>
<p>The result of this is the insertion of just two lines of code at the start of the decompress loop, giving a loop that now looks like this:</p>
<pre>

while ( in &gt;&gt; code ) {
  if ( strings.find( code ) == strings.end() )
    strings[ code ] = previous_string + previous_string[0];
  out &lt;&lt; strings[code];
  if ( previous_string.size() )
    strings[next_code++] = previous_string + strings[code][0];
  previous_string = strings[code];
}
</pre>
<p>And with that, you have a complete implementation of the LZW encoder and decoder.</p>
<h4>Implementation</h4>
<p>Now that I've shown you the algorithm, the next step is to take that code and add turn it into a working program. Without changing the algorithm itself, I'm going to take you through four different customizations that work as follows:</p>
<ul>
<li/>LZW-A reads and writes code values rendered in text mode, which is great for debugging. It means you can view the output of the encoder in a text editor.
<li/>LZW-B reads and writes code values as 16-bit binary integers. This is fast and efficient, and usually results in significant data compresion.
<li/>LZW-C reads and writes code values as N-bit binary integers, where N is determined by the maximum code size. Performing I/O on codes that are not aligned on byte boundaries complicates the code somewhat, but allows for greater efficiency and better compression.
<li/>LZW-D reads and writes code values as variable-length binary integers, starting with 9-bit codes and gradually increasing as the dictionary grows. This gives the maximum compression.
</ul>
<p>Before launching into these implementations, the code I showed above needs some minor tweaking to solve a couple of problems.</p>
<p>The first problem we have to deal with is the ever-expanding dictionary. In the algorithm I've presented, we keep adding new codes to the dictionary without end. This needs to be changed for a couple of reasons.</p>
<p>First, we don't have unlimited memory, so the dictionary simply can't grow forever. Second, practical experience shows that compression ratios don't improve as dictionary sizes grow without bound. As the dictionary grows, code sizes get larger and larger, and so they take up more space in the compressed stream, which can reduce compression efficiency. </p>
<p>To resolve this problem, I just add an additional argument to the encoder and decoder that sets the maximum code value that will be added to the dictionary. The function signatures now look like this:</p>
<pre>
void compress( input_string input,
               output_stream output,
               const unsigned int max_code = 32767 );
void decompress( input_string input,
                 output_stream output,
                 const unsigned int max_code = 32767 );
</pre>
<p>Implementing it means one small change in the encoder:</p>
<pre>
if ( next_code &lt;= max_code )
  codes[ current_string ] = next_code++;
</pre>
<p>And a corresponding change in the decoder:</p>
<pre>
if ( previous_string.size() &#038;& next_code &lt;= max_code )
  codes[ current_string ] = next_code++;
</pre>
<h4>Input and Output</h4>
<p>Finally, I need to give the algorithm a decent way to perform input and output - and this is where C++ offers a huge amount of help.</p>
<p>When writing generic compression code that you intend to use in multiple contexts, one of the more difficult things to deal with is I/O. People using your code might want to compress data in memory, stored in files, or streaming in from sockets or other sources. Some input data sources might be of unknown length (data coming from a TCP socket, for example), while others will be of a prescribed length. Back in the days of C, it was particularly difficult to make your compression code both generic, so it would work with all types of data streams, and efficient, so that I/O doesn't take any more time than it has to.</p>
<p>With the advent of C++, we have a new tool that can help in this quest - templates. Templates are designed to solve this problem in an efficient way, and I take advantage of this in my sample code. The code below shows the final version of the compressor and decompressor that are are used in all four versions of the implementation. There are two final changes made to the routines shown previously. First, both C++ functions are now function templates, parameterized on the the types being used for input and output. Second, the actual input and output is done through four newly introduced template classes:</p>
<pre>
template&lt;class INPUT, class OUTPUT&gt;
void compress( INPUT &amp;input, OUTPUT &amp;output, const unsigned int max_code = 32767 )
{
  input_symbol_stream&lt;INPUT&gt; in( input );
  output_code_stream&lt;OUTPUT&gt; out( output, max_code );

  std::unordered_map&lt;std::string, unsigned int&gt; codes( (max_code * 11)/10 );
  for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,i)] = i;
  unsigned int next_code = 257;
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      if ( next_code &lt;= max_code )
        codes[ current_string ] = next_code++;
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  if ( current_string.size() )
    out &lt;&lt; codes[current_string];
}

template&lt;class INPUT, class OUTPUT&gt;
void decompress( INPUT &amp;input, OUTPUT &amp;output, const unsigned int max_code = 32767  )
{
  input_code_stream&lt;INPUT&gt; in( input, max_code );
  output_symbol_stream&lt;OUTPUT&gt; out( output );

  std::unordered_map&lt;unsigned int,std::string&gt; strings( (max_code * 11) / 10 );
  for ( int unsigned i = 0 ; i &lt; 256 ; i++ )
    strings[i] = std::string(1,i);
  std::string previous_string;
  unsigned int code;
  unsigned int next_code = 257;
  while ( in &gt;&gt; code ) {
    if ( strings.find( code ) == strings.end() )
      strings[ code ] = previous_string + previous_string[0];
    out &lt;&lt; strings[code];
    if ( previous_string.size() &amp;&amp; next_code &lt;= max_code )
      strings[next_code++] = previous_string + strings[code][0];
    previous_string = strings[code];
  }
}
</pre>
<p>What exactly is the effect of implementing this algorithm using a pair of <em>function templates</em>, parameterized on the the types of the input and output objects? What this means is that you can call these compression routines with any type of I/O object you can throw at them. It can work with C++ iostreams, C FILE&nbsp;* objects, raw blocks of memory, whatever you want.</p>
<p>But there's a catch to that flexibility - you have to implement some basic I/O routines for whatever type you are using. Fortunately, this is not too hard.</p>
<p>The actual I/O that is done in the compression routines is defined by four template classes I created. These classes are defined in <code>lzw_streambase.h</code>. These classes don't have implementations, but they do define the methods you need to implement to work with the compressor and decompressor. The four classes are: </p>
<ul>
<li/><code>input_symbol_stream&lt;T&gt;</code>
<li/><code>ouput_symbol_stream&lt;T&gt;</code>
<li/><code>input_code_stream&lt;T&gt;</code>
<li/><code>output_code_stream&lt;T&gt;</code>
</ul>
<p>The first two classes are the symbol input and output classes. These are normally going to be very simple implementations, as they just have to read single characters to and from streams, while checking for errors or ends of streams. I use the same versions of these classes in all four implementations, so the code in <code>lzw-a.h</code> is unchanged in the other three header files.</p>
<p>The <code>input_symbol_stream&lt;T&gt;</code> class has one member function: the extraction operator, which reads a character from the stream and returns a boolean true or false. You'll see later in this section that the implementation of this for types such as <code>std::istream</code> is trivial.</p>
<pre>
template&lt;typename T&gt;
class input_symbol_stream
{
public :
    input_symbol_stream( T &amp; );
    bool operator&gt;&gt;( char &amp;c );
};
</pre>
<p>The <code>output_symbol_stream&lt;T&gt;</code> class uses the insertion operator to write strings instead of individual characters - because that is what is stored in the dictionary. The C++ <code>std::string</code> class makes a perfectly good container for any variety of symbols, including binary data, and unlike the alternative <code>vector&lt;char&gt;</code>, it comes with hash functions and <code>iostream</code> operators.</p>
<pre>
template&lt;typename T&gt;
class output_symbol_stream
{
public :
    output_symbol_stream( T &amp;  );
    void operator&lt;&lt;( const std::string &amp;s );
};
</pre>
<p>The <code>input_code_stream&lt;T&gt;</code> class reads codes, normally unsigned integers, from some type of stream. In my implementations, this class also returns false if it encounters the <code>EOF_CODE</code> in the stream of incoming codes. Removing the responsibility for EOF detection from the decompressor makes the code a bit simpler and more versatile.</p>
<p>The formatting of the integer is entirely up to the implementor, but the most common approach will probably be variable length codes ranging from 9 to 16 or so bits.</p>
<pre>
template&lt;typename T&gt;
class input_code_stream
{
public :
    input_code_stream( T &amp;, unsigned int );
    bool operator&gt;&gt;( unsigned int &amp;i );
};
</pre>
<p>The <code>output_code_stream&lt;T&gt;</code> class writes codes, usually unsigned integers, to some type of stream. Whatever class you implement for this function must agree with the implementation for <code>input_code_stream&lt;T&gt;</code>.</p>
<pre>
template&lt;typename T&gt;
class output_code_stream
{
public :
    output_code_stream( T &amp;, unsigned int );
    void operator&lt;&lt;( const unsigned int i );
};
</pre>
<p>You can see that at the top of the compressor and decompressor, I instantiate objects of these types, then use the standard insertion and extraction operators to read and write from these objects. </p>
<h4>LZW-A</h4>
<p>In my sample windows program, I include <code>lzw_streambase.h</code> and <code>lzw.h</code>, which accounts for all of the code you have seen so far. I have the following lines that perform compression and decompression:</p>
<pre>
std::ifstream in( name, std::ios_base::binary );
std::ofstream lzw_out( temp_name_lzw, std::ios_base::binary );
compress( (std::istream &amp;) in, (std::ostream&amp;) lzw_out, pDlg-&gt;m_MaxCodeSize );
.
.
.
std::ifstream lzw_in( temp_name_lzw, std::ios_base::binary );
std::fstream out( temp_name_out,
                  std::fstream::in    |
                  std::fstream::out   |
                  std::fstream::binary );
decompress( (std::istream &amp;) lzw_in, (std::ostream&amp;) out, pDlg-&gt;m_MaxCodeSize );
</pre>
<p>If I try to build this project as-is, I get a nasty list of eight linker errors:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure01.png"/></td>
</tr>
<tr>
<td><center>Visual Studio 10 Error Messages</center></td>
</tr>
</table>
<p></center><br />
If you have the fortitude to crawl through those link errors, you will see that what is missing are the implementations of the four classes parameterized on <code>std::ostream</code> and <code>std::istream</code>. Each of the four classes needs the implementation of a constructor and either an insertion or extraction operator. And with no class definitions at all, that adds up to eight missing functions. To get us started on performing actual LZW compression, I've created the first implementation of these four classes in <code>lzw-a.h</code>. Let's take a look at each of these in turn.</p>
<p>It's tempting to try to read characters using the <code>ifstream</code> extraction operator, as in <code>m_impl &gt;&gt; c</code>, but that operator skips over whitespace, so we don't get an exact copy of the input stream. Using <code>get()</code> works around this problem. Below is the complete definition of <code>input_symbol_stream&lt;std::istream&gt;</code> used in all four LZW implementations in this article:</p>
<pre>
template&lt;&gt;
class input_symbol_stream&lt;std::istream&gt; {
public :
    input_symbol_stream( std::istream &amp;input )
        : m_input( input ) {}
    bool operator&gt;&gt;( char &amp;c )
    {
        if ( !m_input.get( c ) )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>Using the insertion operator to output strings seems to work properly, even when the strings contain binary data, so the implementation of the class used to output symbols is as simple as we could hope for. Again, this exact code is used in all four implementations in this article:</p>
<pre>
template&lt;&gt;
class output_symbol_stream&lt;std::ostream&gt; {
public :
    output_symbol_stream( std::ostream &amp;output )
        : m_output( output ) {}
    void operator&lt;&lt;( const std::string &amp;s )
    {
        m_output &lt;&lt; s;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>LZW-A prints the text values of integers to the output stream, and reads them back in that format. This is not efficient at all, but it is a great aid in debugging. If you are having a problem with the algorithm, this provides a nice way to examine your stream. The implementation of this is very simple - just use the <code>std::ostream</code> insertion operator, and follow each code by a newline so it can be properly parsed on input, as well as be easily loaded into a text editor.</p>
<p>One important thing to notice in this class: the presence of a destructor that prints the <code>EOF_CODE</code>. Since this object goes out of scope as the compressor exits, this insures that every code stream will end with this special code. Putting the onus on the I/O routines to deal with EOF issues simplifies the algorithm itself. (It also means that you can implement versions of LZW that don't use an EOF in the code stream.)</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt; {
public :
    output_code_stream( std::ostream &amp;output, const unsigned int )
        : m_output( output ) {}
    void operator&lt;&lt;( unsigned int i )
    {
        m_output &lt;&lt; i &lt;&lt; '\n';
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>The corresponding version of the input class just reads in the white-space separated codes. If there is an error or an <code>EOF_CODE</code> encountered in the stream, the extraction operator returns false, which allows the decompressor to know when it is time to stop processing.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt; {
public :
    input_code_stream( std::istream &amp;input, unsigned int )
        : m_input( input ) {}
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        m_input &gt;&gt; i;
        if ( !m_input || i == EOF_CODE )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>By including <code>lzw-a.h</code> along with the other two header files, I can now create a program that compiles, links, and is able to test the algorithm. Using my UNIX test program, I compress the demo string from earlier in this article, and I see the output as it is sent directly to <code>stdout</code>:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure02.png"/></td>
</tr>
<tr>
<td><center>Compressing <code>ABBABBBABBA</code></center></td>
</tr>
</table>
<p></center><br />
Fortunately, the output is identical to what was shown earlier, with the addition of the final <code>EOF_CODE</code> used to delimit the end of the code stream.</p>
<h4>LZW-B</h4>
<p>The header file <code>lzw-b.h</code> implements specialized classes that replace the text-mode output of the codes in <code>lzw-a.h</code> with binary codes stored in a short integer - two bytes. </p>
<p>The classes that read and write symbols are unchanged, but reading and writing codes has to change in order to do this new binary output.</p>
<p>Writing the codes to <code>std::ostream</code> as binary values requires breaking the integer code into two bytes and writing the bytes one at a time. There are more efficient ways to write the complete short integer in one function call, but they raise code portability problems, as we don't always know what order bytes will be written in.</p>
<p>Like the code stream output object in <code>lzw-a.h</code>, this version of the code output class has a destructor that outputs an <code>EOF_CODE</code> value:</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt; {
public :
    output_code_stream( std::ostream &amp;output, const unsigned int )
        : m_output( output ) {}
    void operator&lt;&lt;( unsigned int i )
    {
        m_output.put( i &amp; 0xff );
        m_output.put( (i&gt;&gt;8) &amp; 0xff);
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>Reading the codes requires reading the two bytes that make up the short integer, then combining them. While reading, if the routine detects an <code>EOF_CODE</code>, it returns false, which tells the decompressor to stop processing. It also returns false if there is an error on the input code stream.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt; {
public :
    input_code_stream( std::istream &amp;input, unsigned int )
        : m_input( input ) {}
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        char c;
        if ( !m_input.get(c) )
            return false;
        i = c &amp; 0xff;
        if ( !m_input.get(c) )
            return false;
        i |= (c &amp; 0xff) &lt;&lt; 8;
        if ( i == EOF_CODE )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>The most exciting thing about <code>lzw-b.h</code> is that you can now see data compression taking place. The figure below shows a sample run of this implementation against the <a href="http://corpus.canterbury.ac.nz/descriptions/" class="newpage">Canterbury Corpus</a>, a standard set of files used to test compression. A run with my Windows test program shows that  the files are compressing quite nicely:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure03.png"/></td>
</tr>
<tr>
<td><center>Compressing the Canterbury Corpus with <code>lzw-b.h</code></center></td>
</tr>
</table>
<p></center></p>
<h4>LZW-C</h4>
<p>The third I/O implentation, defined in <code>lzw-c.h</code>, writes binary codes like <code>lzw-b.h</code>, but with one crucial difference. Instead of being hard coded to 16 bit codes, <code>lzw-c.h</code> determines the maximum code size needed based on the maximum code value passed as an argument to <code>compress()</code> and <code>decompress()</code>. It then writes codes based on that width, which will normally be something in the range of 9-18 bits wide.</p>
<p>Since these values are not aligned with byte boundaries, there are some issues writing them to streams that expect to read and write bytes. However, it is definitely worth all the bit shifting, ORing, and ANDing, because when the size is 12 bites, we are going to save four bits per code when compared to using <code>lzw-b.h</code>. But every read and write potentially starts somewhere in the middle of a byte, so the I/O classes have to do some extra work - mostly involved with shifting bits to the correct position in the output stream.</p>
<p>Note that the code to read and write symbols is unchanged from <code>lzw-a.h</code> and <code>lzw-b.h</code>.</p>
<p>Many of the CS students who read my earlier article on LZW ran into a brick wall when they started trying to understand the code that performs I/O on codes of variable bit lengths. Obviously, writing 11 bit codes when your file system is oriented around eight-bit bytes involves a lot of bit twiddling, and I'm afraid that many novices are woefully deficient in this department. Not just in understanding the bitwise operators in C, such as shifting, masking, etc., but in understanding binary arithmetic in general.</p>
<p>That's why I've structured the code and this article a bit differently this time around. If the I/O operations in <code>lzw-c.h</code> and <code>lzw-d.h</code> are bewildering, well, no worries. They have absolutely nothing to do with the LZW algorithm itself. You can investigate and explore the algorithm completely using <code>lzw-a.h</code> and <code>lzw-b.h</code>, and just forget about the last two I/O implementations. They provide additional efficiency, but as I have said, have nothing to do with the algorithm itself. </p>
<p>Further, once you use <code>lzw-a.h</code> to debug and understand the algorithm, you can certainly plug in <code>lzw-c.h</code> and <code>lzw-d.h</code> and take advantage of their improved compression, even if you don't follow all the code. </p>
<p>It might be appropriate to add a sidebar or another section to explain the variable bit length I/O in detail, but this article is quite long already, and there are numerous other resources for the interested reader to explore the details. (But if you find yourself deficient in this area, you owe it to yourself to hit the books and get to the point where these operations make sense. This won't be the last time you need to understand bitwise operators.)</p>
<p>For those who are ready to tackle this more complicated I/O procedure, we will look first at the <code>output_code_stream&lt;std::ostream&gt;</code> class. Here, the first thing to understand is that the constructor has to initialize the number of bits in the code. This value is calculated from the <code>max_code</code> parameter, and is stored in member <code>m_code_size</code>, where it is used frequently.</p>
<p>Next, the insertion operator. Output of codes proceeds as follows. Member <code>m_pending_bits</code> tells us how many bits are pending output while sitting in member <code>m_pending_output</code>. These bits are right justified, and the count will always be less than eight. When the new code is written, it is inserted into <code>m_pending_output</code> after being left shifted so it will be laid down just past the pending bits. After doing that, we presumably have some bytes to output - the exact number depends on various factors. The <code>flush()</code> routine is called, and it flushes all complete bytes out. When it completes, there can be anywhere from zero to seven bits still waiting to be output, and they will be right justified in <code>m_pending_output</code>.</p>
<p>In the destructor, we output an <code>EOF_CODE</code>, and then do a flush as well. But in this case, we flush all possible bits, not just the complete bytes. There are two good reasons for this. First,  we don't care if the last bits that are flushed out are only part of a code - the code will be <code>EOF_CODE</code>, and that is the last one. And second, if we don't flush those final bits out in the destructor, they will never be sent to the output stream. This means the decoder will not see those bits, and we will most likely break the decompress process.</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt;
{
public :
    output_code_stream( std::ostream &amp;out, unsigned int max_code )
        : m_output( out ),
          m_pending_bits(0),
          m_pending_output(0),
          m_code_size(1)
    {
        while ( max_code &gt;&gt;= 1 )
            m_code_size++;
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
        flush(0);
    }
    void operator&lt;&lt;( const unsigned int &amp;i )
    {
        m_pending_output |= i &lt;&lt; m_pending_bits;
        m_pending_bits += m_code_size;
        flush( 8 );
    }
private :
    void flush( const int val )
    {
        while ( m_pending_bits &gt;= val ) {
            m_output.put( m_pending_output &amp; 0xff );
            m_pending_output &gt;&gt;= 8;
            m_pending_bits -= 8;
        }
    }
    std::ostream &amp;m_output;
    int m_code_size;
    int m_pending_bits;
    unsigned int m_pending_output;
};
</pre>
<p>Like the output code class, the input code class has to calculate the code size for this decompression based on the <code>max_code</code> value passed in the function call. </p>
<p>When an attempt is made to read a code, there must be a  minimum of <code>m_code_size</code> bits in member <code>m_pending_input</code>. If there aren't, new bytes are read in one at a time, and inserted into <code>m_pending_input</code> after having been shifted left the appropriate amount. Once <code>m_pending_input</code> contains at least <code>m_code_size</code> bits, the code is extracted from <code>m_pending_input</code> using the appropriate mask, the count in <code>m_pending_input</code> is reduced, and <code>m_pending_input</code> is shifted right by <code>m_code_size</code> bits.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt;
{
public :
    input_code_stream( std::istream &amp;in, unsigned int max_code )
        : m_input( in ),
          m_available_bits(0),
          m_pending_input(0),
          m_code_size(1)
    {
        while ( max_code &gt;&gt;= 1 )
            m_code_size++;
    }
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        while ( m_available_bits &lt; m_code_size )
        {
            char c;
            if ( !m_input.get(c) )
                return false;
            m_pending_input |= (c &amp; 0xff) &lt;&lt; m_available_bits;
            m_available_bits += 8;
        }
        i = m_pending_input &amp; ~(~0 &lt;&lt; m_code_size);
        m_pending_input &gt;&gt;= m_code_size;
        m_available_bits -= m_code_size;
        if ( i == EOF_CODE )
            return false;
        else
            return true;
}
private :
    std::istream &amp;m_input;
    int m_code_size;
    int m_available_bits;
    unsigned int m_pending_input;
};
</pre>
<p>The table below shows the results of a test run comparing LZW-B and LZW-C run with a maximum code of 4095. With this maximum value, all codes fit in a 12-bit integer. Since LZW-B will use a 16-bit integer to store the code values, and LZW-C will use 12-bits, there should be a 4:3 ratio between the ratio of the file sizes when compressed using the two algorithms, and this looks to be the case:<br />
<center></p>
<table border=1">
<tr>
<th>File Name</th>
<th>Original<br/>Size</th>
<th>Compressed<br/>LZW-B</th>
<th>Compressed<br/>LZW-C</th>
<th>Ratio</th>
</tr>
<tr>
<td>alice29.txt</td>
<td>152089</td>
<td>96428</td>
<td>72322</td>
<td>0.750</td>
</tr>
<tr>
<td>alphabet.txt</td>
<td>100000</td>
<td>4538</td>
<td>3404</td>
<td>0.750</td>
</tr>
<tr>
<td>asyoulik.txt</td>
<td>125179</td>
<td>83966</td>
<td>62975</td>
<td>0.750</td>
</tr>
<tr>
<td>bib</td>
<td>111261</td>
<td>71792</td>
<td>53845</td>
<td>0.750</td>
</tr>
<tr>
<td>bible.txt</td>
<td>4047392</td>
<td>2468326</td>
<td>1851245</td>
<td>0.750</td>
</tr>
</table>
<p>Comparing 12-bit compression between LZW-B and LZW-C<br />
</center><br />
It looks like things are working as expected.</p>
<h4>LZW-D</h4>
<p>The code in <code>lzw-d.h</code> represents the final and most efficient version of I/O for the LZW code streams. It builds on the code in <code>lzw-c.h</code> - at its core it is a variable bit-length I/O stream. However, there is one crucial difference from <code>lzw-c.h</code>: the code I/O in <code>lzw-d.h</code> starts at the smallest possible code size, nine bits, and increases the code size as needed, until it reaches the maximum value for this compression session. The maximum value is the parameter passed in to the invocation of <code>compress()</code> or <code>decompress()</code>.</p>
<p>The logic behind this is pretty simple. Even if we are going to use 16-bit codes in an LZW program, when the program first starts, the maximum possible code the program can emit is 256, which only needs nine bits to encode. And each time we output a new symbol, that maximum possible code value only increases by one, which means that the first 256 codes output by the encoder can all fit in nine bits.</p>
<p>So the LZW-D encoder starts encoding using nine-bit code widths, and then bumps the value to ten as soon as the highest possible output code reaches 512. This process continues, incrementing the code size until the maximum code size is reached. At that point the code size stays fixed, as no new codes are being added to the dictionary.</p>
<p>The decoder follows exactly the same process - reading in the first code with a width of nine bits, then bumping to ten when the maximum possible input code reaches 512.</p>
<p>The code for this class is built on that from <code>lzw-c.h</code>, with some added complexity. Due to its increasing length, and the fact that it doesn't add too much to the discussion of LZW, I've omitted the listing, and instead refer you to the download available at the end of the article.</p>
<h4>The Windows Test Program</h4>
<p>When you develop compression code, there are a few different common tasks you are likely to want to perform:</p>
<ul>
<li/>Check your code for correctness, often through bulk testing.
<li/>Check your compression ratios against standard benchmarks.
<li/>Analyze your program's performance so as to make it more efficient and locate bottlenecks.
</ul>
<p>My Windows app is designed to help with all of these tasks. It basically allows you to select a single directory, set a maximum code size, then perform compression and decompression of all the files in the directory. An optional checkbox lets you include files in all directories under the test directory as well.</p>
<p>The application was built using Visual Studio 10, and it is a simple MFC Dialog-based application. It allows you to select a base directory, a maximum code size, and then compress all the files in that directory. If you select the recursion check box, you will also compress all the files in the entire tree of subdirectories below it.</p>
<p>Each file is compressed to a temporary location, then decompressed in a temporary location. The size of the compressed file is saved, and then a comparison is done to ensure that the original and expanded files are identical.</p>
<p>To help with data collection, after running a test, you can press the copy button and get the results of the test stuffed into your clipboard. Although it isn't visible in the display, the data stored in your clipboard includes the full path name of the original file, not just the basename.</p>
<p>This Visual Studio project takes advantage of a number of C++11 features, and as a result it will need some modification to work with earlier versions. Any version that supports <code>unordered_map</code> can be made to build without too many changes. And if you are going way back in time, you could replace <code>unordered_map</code> with <code>map</code>.</p>
<p>As shipped, the test program uses <code>lzw-d.h</code>. To use any of the other three other versions of I/O discussed in this article, just modify the include file selected at the top of LzwTestDlg.cpp. The figure below shows what the app looks like after running through some data:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure04.png"/></td>
</tr>
<tr>
<td><center>The Windows test app after a test run</center></td>
</tr>
</table>
<p></center><br />
After pressing the copy button at the bottom of the dialog, you can paste the data into a spreadsheet and then crunch it to your heart's content:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure05.png"/></td>
</tr>
<tr>
<td><center>Copying the data into a spreadsheet</center></td>
</tr>
</table>
<p></center></p>
<h4>The Linux Test Program</h4>
<p>The LZW code is platform independent, and will build and run just fine on UNIX or Linux systems. The Linux test program, <code>lzw.cpp</code>, allows you to compress or decompress files from the command line. It builds just fine with g++ 4.5, as long as you use the <code>-std=c++0x</code> switch to turn on the latest language features. Compiling with earlier versions will require a few minor modifications.</p>
<p>The command line interface to the test program is not too complicated, and is probably best documented by looking at the usage output:</p>
<pre>
mrn@ubuntu:~/LzwTest$ g++ -std=c++0x lzw.cpp -o lzw
mrn@ubuntu:~/LzwTest$ ./lzw
Usage:
lzw [-max max_code] -c input output #compress file input to file output
lzw [-max max_code] -c - output     #compress stdin to file otuput
lzw [-max max_code] -c input        #compress file input to stdout
lzw [-max max_code] -c              #compress stdin to stdout
lzw [-max max_code] -d input output #decompress file input to file output
lzw [-max max_code] -d - output     #decompress stdin to file otuput
lzw [-max max_code] -d input        #decompress file input to stdout
lzw [-max max_code] -d              #decompress stdin to stdout
mrn@ubuntu:~/LzwTest$
</pre>
<p>Like the Windows test program, the command line program is built by default with <code>lzw-d.h</code>. Replacing this algorithm with any of the three others requires a minor change to the source code.</p>
<p>With the default build, the program produces output nearly identical to UNIX compress. The one difference is that UNIX compress monitors the compression ratio after the dictionary is full, and clears the dictionary if the ratio starts to deteriorate (which it almost always does.) I include a benchmark program that tests UNIX compress against the command line test program, and the results show that for small files, the file size is almost identical:</p>
<pre>
mrn@ubuntu:~/LzwTest$ ./benchmark.sh 65535 16 canterbury | head -n 15 | column -t
Filename                 Original-size  LZW-size  Compress-size
--------                 -------------  --------  -------------
canterbury/aaa.txt       33406          320       321
canterbury/alice29.txt   152089         62247     62247
canterbury/alphabet.txt  100000         3052      3053
canterbury/asyoulik.txt  125179         54989     54990
canterbury/a.txt         1              3         5
canterbury/bib           111261         46527     46528
canterbury/bible.txt     4047392        1417735   1377093
canterbury/book1         768771         317133    317133
canterbury/book2         610856         247593    251289
canterbury/cp.html       24603          11315     11317
canterbury/E.coli        4638690        1213579   1218349
canterbury/fields.c      11150          4963      4964
canterbury/geo           102400         77777     77777
</pre>
<p>You can see in this test that LZW-D and UNIX compress perform nearly identically for all but the largest files in the test sample. If I modify UNIX compress to not monitor compression ratios, the difference seen with larger files goes away:</p>
<pre>
mrn@ubuntu:~/LzwTest$ ./benchmark.sh 65535 16 canterbury | head -n 15 | column -t
Filename                 Original-size  LZW-size  Compress-size
--------                 -------------  --------  -------------
canterbury/aaa.txt       33406          320       321
canterbury/alice29.txt   152089         62247     62247
canterbury/alphabet.txt  100000         3052      3053
canterbury/asyoulik.txt  125179         54989     54990
canterbury/a.txt         1              3         5
canterbury/bib           111261         46527     46528
canterbury/bible.txt     4047392        1417735   1417735
canterbury/book1         768771         317133    317133
canterbury/book2         610856         247593    247593
canterbury/cp.html       24603          11315     11317
canterbury/E.coli        4638690        1213579   1213579
canterbury/fields.c      11150          4963      4964
canterbury/geo           102400         77777     77777
</pre>
<p>That provides some support for the notion that the algorithm shown here behaves properly.</p>
<h4>Your Program</h4>
<p>If you want to build your own program and use these classes, all you need is a C++11 compiler, or an earlier version and a willingness to make a few changes. </p>
<p>To use the classes, include in order <code>lzw_streambase.h</code>, one of the four implementation files for <code>iostreams</code>, preferably <code>lzw-d.h</code>, and finally, <code>lzw.h</code>. Because the significant code in these files is all implemented as template functions or classes, there is no library to include in your project, and no C++ source you have to compile separately.</p>
<p>All of the code in these header files has been hoisted into the <code>lzw</code> namespace, so you will either have to explicitly use the namespace when you invoke <code>compress()</code> and <code>decompress()</code>, or insert this line into your program:</p>
<pre>
using namespace lzw;
</pre>
<p>One thing to note about the I/O routines I have defined. The template functions are specialized on <code>std::istream</code> and <code>std::ostream</code>. If you innocently pass in an object such as an <code>std::ifstream</code>, you will get compile time errors. This is because C++ template matching is done on a very strict basis - the compiler won't generally try to figure out that <code>std::ifstream</code> is derived from <code>std::istream</code>, and use the existing class. So instead, you will need to cast your arguments to the types defined in the header files. (Or write your own implementations.)</p>
<p>Your rights to use this code are covered by my <a href="http://marknelson.us/code-use-policy/" class="newpage">Liberal Code Use Policy</a>. As I have mentioned before, this is teaching code, if you decide to use it in a production system, there are many optimizations you might want to perform.</p>
<h4>Benchmarks</h4>
<p>So how does LZW do when it comes to compression? LZW's original strength was its combination of good compression ratios with high speed compression. The UNIX compress program is still nice and  fast, and Terry Welch's original application for LZW was in disk controllers. Because my program is a teaching program, it won't be nearly as fast as compress, but it's still useful to compare it to the de facto standard for lossless compression: the deflate algorithm.</p>
<p>We can compare LZW against deflate by a small modification of my benchmark script that uses gzip instead of compress. The table below shows the average compression ratios for the files in the canterbury corpus when compressed using maximum code widths of 15-18 bits. (The ratio is defined as 100*compressed_size/uncompressed_size, so 0% is perfect compression and 100% is no compression.)<br />
<center></p>
<table border="1">
<tr>
<th>gzip</th>
<th>LZW 15 bits</th>
<th>LZW 16 bits</th>
<th>LZW 17 bits</th>
<th>LZW 18 bits</th>
</tr>
<tr>
<td>32.7%</td>
<td>43.2%</td>
<td>42.6%</td>
<td>42.5%</td>
<td>42.3%</td>
</tr>
</table>
<p></center><br />
You can see that LZW does do a good job of compressing data, but the deflate algorithm used by gzip manages to squeeze an additional 10%, more or less, out of the files it compresses. The gap between LZW and deflate is larger on some types of files, and smaller on others, but deflate will almost always show a noticeable difference in compression ratios.</p>
<h4>Variations</h4>
<p>There are many variations on the code I've presented here that make sense. </p>
<p>One obvious change is to eliminate the special <code>EOF_CODE</code> used to delimit the end of the code stream. If the code stream is a file or other stream with an inherent EOF condition, there is no need for an <code>EOF_CODE</code> - simply reaching the end of the input stream will properly signal the end of the decoded material. Freeing up this one code will make a microscopically small improvement in the compression ratios of the product.</p>
<p>If you want to mimic the output of the compress program, you need to remove the <code>EOF_CODE</code>, and replace it with a <code>CLEAR_CODE</code> that has a value of 256. The compress program monitors the compression ratios it achieves after its dictionary is full, and when the ratio starts to decay, it issues the <code>CLEAR_CODE</code>. That code tells the decoder to clear its dictionary and make a fresh start with new nine-bit codes.</p>
<p>Once you get the hang of LZW, a good exercise to make sure you have it working properly is to create a GIF encoder and decoder. GIF uses LZW to losslessly compress images with a constrained palette, and after all these years is still somewhat of a standard on the web.</p>
<h4>History</h4>
<p>Usually the history lesson on an algorithm is at the start of the article, but this is a how-to piece, and I feel like the trip down memory lane is not as important as understanding how the algorithm works.</p>
<p>The roots of LZW were set down in 1978 when Jacob Ziv and Abraham Lempel published the second of their two seminal works on data compression, <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/ziv_lempel_1978_variable-rate.pdf" class="newpage">"Compression of Individual Sequences via Variable-Rate Coding"</a>. This paper described a general approach to data compression that involved building dictionaries of previously seen strings.</p>
<p>Ziv and Lempel's work was targeted at an academic audience, and it wasn't truly popularized until 1984 when Terry Welch published <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/welch_1984_technique_for.pdf" class="newpage">A Technique for High-Performance Data Compression</a>. Welch's paper took the somewhat abstract Information Theory work of Ziv and Lempel and reduced it to practice in such a way that others could easily implement it.</p>
<p>UNIX compress was probably the first popular program that used LZW compression, and it very quickly became a standard utility on UNIX systems. The freely available code for compress was incorporated into <a href="http://en.wikipedia.org/wiki/ARC_(file_format)" class="newpage">ARC</a>, one of the first archiving programs for PCs. In addition, the algorithm was used in the GIF file format, originally created by Compuserve in 1987.</p>
<p>LZW's popularity waned in the 1990s for two important reasons. First, Unisys began enforcing their patents that covered LZW compression, demanding and receiving royalties from various software companies. Not only did this make developers think twice about the liability they could incur while using LZW, it resulted in a general public relations backlash against using patented technology.</p>
<p>Secondly, the LZW algorithm was eclipsed on the desktop by deflate, as popularized by PKZIP. Not only did deflate outperform LZW, it was unencumbered by patents, and eventually had a very reliable and free open source implementation in <a href="http://zlib.net/" class="newpage">zlib</a>, a library written by a team lead by Marc Adler and Jean-loup Gailly. I don't know if there is any way to actually quantify this, but I think one could speculate that zlib is currently installed on more computer systems than any other software package in existence.</p>
<p>So LZW has settled down to an existence out of the limelight. It is still an important algorithm, used in quite a few file formats, and as this article shows, its simplicity makes it an excellent learning tool. </p>
<h4>Downloads</h4>
<ul>
<li><a href="/attachments/2011/lzw/LzwTest.zip">LzwTest.zip</a> - source for the Windows test app.
<li><a href="/attachments/2011/lzw/LzwExe.zip">LzwExe.zip</a> - The Windows test app executable.
<li><a href="/attachments/2011/lzw/lzw.tgz">lzw.tgz</a> - source for the UNIX text app.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/11/08/lzw-revisited/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>DNS Service Discovery On Windows</title>
		<link>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/</link>
		<comments>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 11:54:04 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=978</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/' addthis:title='DNS Service Discovery On Windows' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>In a previous post I showed you how we use DNS Service Discovery in a product I work on for Cisco Systems. That project uses the Avahi browser, which does not have a Windows port. In this article, I'll show you how to perform service discovery on Windows using Apple's Bonjour SDK for Windows. DNS-SD [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/' addthis:title='DNS Service Discovery On Windows' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>In a <a href="http://marknelson.us/2011/09/30/dns-service-discovery/" class="newpage">previous post</a> I showed you how we use DNS Service Discovery in a product I work on for Cisco Systems. That project uses the <a href="http://avahi.org/" class="newpage">Avahi browser</a>, which does not have a Windows port. In this article, I'll show you how to perform service discovery on Windows using Apple's <a href="http://developer.apple.com/opensource/" class="newpage">Bonjour SDK for Windows</a>.<br />
<span id="more-978"></span></p>
<h4>DNS-SD On Windows</h4>
<p>Microsoft has been pushing us to use <a href="http://en.wikipedia.org/wiki/Universal_Plug_and_Play" class="newpage">UPnP</a> as our network discovery protocol, to the exclusion of all others. As a result, Windows ships with no support for DNS-SD - zip. And that might be the end of it, if it weren't for Apple's vested interest in having iTunes installed on every Windows machine in the world. </p>
<p>iTunes uses DNS-SD to share music catalogs across Local Area Networks - a natural choice with the native support in OS X for Zeroconf. Rather than isolate Windows users in an pocket UPnP universe, Apple chose to instead port the service discovery components of Bonjour to Windows, and install it with every copy of Windows. Thus, Windows and OS X users can happily share their iTunes libraries with no unusual calisthenics required.</p>
<p>To sweeten the deal, Apple has released a Bonjour SDK for Windows, which is currently shipping version 3.0 from their <a href="http://developer.apple.com/opensource/" class="newpage">developer support site</a>. In Apple's words:</p>
<blockquote><p>
The Bonjour SDK for Windows includes the latest version of Bonjour as well as header files, libraries, and sample code to assist developers in creating Bonjour enabled applications on Windows. The SDK has been updated with the Bonjour core that is bundled with iTunes 10.3.1. This release will bring all existing Bonjour functionality released in Mac OS X 10.7 into the Bonjour for Windows product.
</p></blockquote>
<p>It sounds pretty good, doesn't it?</p>
<h4>An Emphasis on Kit</h4>
<p>Installing the SDK is a decision-free breeze:<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure01.png"/></center></td>
</tr>
<tr>
<td><center>Installing the SDK</center></td>
</tr>
</table>
<p></center><br />
The installation may be easy, but the real dirty truth about the Bonjour SDK is that it is not much of an SDK at all. The developer interface to Bonjour services are packaged in a single DLL, and the SDK provides programmers with sample code that illustrates how to use some of the bindings in C, C#, Java, and VB. There is no documentation on the interfaces, no source for the Bonjour components, and the few samples don't begin to provide comprehensive coverage of the interface.</p>
<p>In other words, this is pretty much a code dump.</p>
<h4>An Overview</h4>
<p>For the most part, the Bonjour SDK interfaces follow a single pattern. Each request made to the API returns immediately, and gives you a reference that you can use to track the progress of your request. That reference can be converted to a file handle, and you can use the file handle to see when your request has some data to produce.</p>
<p>When your request has generated some data, and the Bonjour components are ready to deliver it to you, they do so by a callback mechanism - the Bonjour components make calls into your C or C++ program and provide the data you requested.</p>
<p>Most of the callbacks include a flag parameter. You can check the flag to see if there is any more data expected. If there isn't, you can delete the reference and you are then done with that particular call. Otherwise, you will have to wait for the request to be responded to when the Bonjour components get around to it.</p>
<h4>An Example - Discover Service Types</h4>
<p>I've written a demo program that browses the network for details on every service instance it can find, and presents the results in a tree format. The figure below shows the program running on my home network, where I have drilled down to get information about an instance of a print service.<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure02.png"/></center></td>
</tr>
<tr>
<td><center>The ServiceBrowser Sample Program</center></td>
</tr>
</table>
<p></center><br />
This program has to start by doing a top-level iteration of all the service types currently seen on my network. From my previous article, you might remember that I can use a special browse command to accomplish this. If you've installed the Bonjour SDK, you can run the dns-sd program and browse for service type <code>_services._dns-sd._udp</code>, which should give results something like this:</p>
<pre>
C:\Users\Mark>dns-sd -B _services._dns-sd._udp
Browsing for _services._dns-sd._udp
Timestamp     A/R Flags if Domain  Service Type  Instance Name
17:38:09.407  Add     3 13 .       _tcp.local.     _smb
17:38:09.407  Add     3 13 .       _tcp.local.     _printer
17:38:09.408  Add     3 13 .       _tcp.local.     _pdl-datastream
17:38:09.408  Add     3 13 .       _tcp.local.     _http
17:38:09.408  Add     3 13 .       _tcp.local.     _tivo-videos
17:38:09.409  Add     2 13 .       _tcp.local.     _csco-sb
</pre>
<p>Looking to do the same thing in my demo program, I turn to the header file <code>dns_sd.h</code>. This file ships with the Bonjour SDK and contains not only the API definition, but what passes for documentation. In that header file I see that there is a function called <code>DNSServiceBrowse</code>, and it looks like it does exactly what I want. In my sample program, my call to this routine is shown below:</p>
<pre>
DNSServiceRef client = NULL;
DNSServiceErrorType err;
err = DNSServiceBrowse( &amp;client,
                        0,
                        0,
                        &quot;_services._dns-sd._udp&quot;,
                        &quot;&quot;,
                        IterateServiceTypes,
                        this );
</pre>
<p>You can get some exposition on each of the arguments I pass to the function from the header file, my brief comments on each are given below:</p>
<table cellspacing="10">
<tr>
<td valign="top">sdRef</td>
<td>Every call to the Bonjour API creates a new <code>DNSServiceRef</code> handle. It is initialized by the function call, and used later to retrieve the results.</td>
</tr>
<tr>
<td valign="top">flags</td>
<td>This parameter is not used in this version of the SDK.</td>
</tr>
<tr>
<td valign="top">interfaceIndex</td>
<td>This argument is used to select a specific network interface. For this particular function I want to browse on all available interfaces, so a value of 0 is used.</td>
</tr>
<tr>
<td valign="top">regType</td>
<td>The type of service being browsed for. Normally when you call <code>DNSServiceBrowse</code> you will use this parameter to specify a specific service type that you are interested in, such as <code>_http._tcp</code>. I'm using the special type of <code>_services._dns-sd._udp</code> in order to get a list of all published service types, not specific instances.</td>
</tr>
<tr>
<td valign="top">domain</td>
<td>By passing an empty string I tell the service that I want to see advertisements from all domains.</tr>
</td>
<tr>
<td valign="top">callback</td>
<td>The pointer to a callback function that will receive the responses to this request. The argument I pass in, <code>IterateServiceTypes</code>, is a static member of my MCF Dialog class.</td>
</tr>
<tr>
<td valign="top">context</td>
<td>The context variable is an opaque pointer type that is passed in to the Bonjour service. When it performs a callback, it will include a copy of the context pointer for the use of the callback function. I always pass in a pointer to my MFC Dialog class, so my callback functions have full access to the class members.</td>
</tr>
</table>
<p>The important thing to note here is that the call to the API function doesn't return any important data. All I get back is an error code indicating that the request is being processed, and a <code>DNSServiceRef</code> handle that I use to track that progress.</p>
<p>The real action comes when my callback function is invoked. <code>IterateServiceTypes</code> is a static member of my Dialog class. Apple kept things simple by having one callback type for C and C++, which means no member functions. You could easily build shims to make it appear as though the DLL was calling member functions - it would just take a small modification to the code I'll show you here.</p>
<p>The function definition has to follow exactly the declaration given in <code>dns_sd.h</code>. My implementation starts like this:</p>
<pre>
void DNSSD_API CServiceBrowserDlg::IterateServiceTypes( DNSServiceRef sdRef,
                                                        DNSServiceFlags flags,
                                                        uint32_t interfaceIndex,
                                                        DNSServiceErrorType errorCode,
                                                        const char *serviceName,
                                                        const char *regtype,
                                                        const char *domain,
                                                        void *context )
{
	CServiceBrowserDlg *p = (CServiceBrowserDlg *) context;
</pre>
<p>It's worth walking through a look at each of the parameters in the callback:</p>
<table border="0" cellspacing="10">
<tr>
<td valign="top">sdRef</td>
<td>This is the same <code>DNSServiceRef</code> value that was created when the call was made to <code>DNSServiceBrowse</code>. You don't need to make any use of it in the callback, but it does provide a good way to correlate results with function calls, particularly if you are using a single callback function to process many different results.</td>
</tr>
<tr>
<td valign="top">flags</td>
<td>There are two important flag bits to check in this value. The first value, <code>kDNSServiceFlagsMoreComing</code> is used to indicate that there are definitely more callbacks coming. If that bit is cleared, there is no more pending data. The second flag bit, <code>kDNSServiceFlagsAdd</code> is used to indicate whether this service is being added or deleted. When you first start browsing, all the callbacks will be for services added. As services are added and removed from the system, additional callbacks will be generated with this bit both set and cleared.</td>
</tr>
<tr>
<td valign="top">interfaceIndex</td>
<td>In the callback, this index will be set to the index of the network interface where the advertisement was found. When it comes time to resolve this service, you need to pass in the correct index.</td>
</tr>
<tr>
<td valign="top">errorCode</td>
<td>If this value is not zero, the callback is indicating an error. As long as it is zero your code can process the input safely.</td>
</tr>
<tr>
<td valign="top">serviceName</td>
<td>This value contains the name of a discovered service - it is the whole point of the callback. Normally this will contain the name of an instance of a service. However, when browsing for the special name <code>_services._dns-sd._udp</code>, the instance name is actually a service type.</td>
</tr>
<tr>
<td valign="top">regtype</td>
<td>The type of the service - you may already know this information by the time you reach the callback, but if the callback is handling the results from multiple queries, it can be helfpul.</td>
</tr>
<tr>
<td valign="top">domain</td>
<td>The domain of the discovered service. Like the interface index, you need use the domain when you are attempting to resolve the service</td>
</tr>
<tr>
<td valign="top">context</td>
<td>A copy of the context variable passed in when the browse call was made.</td>
</tr>
</table>
<p>If you look at the first line of code above, the first thing I do is cast the context pointer to its correct type, a pointer to my MFC Dialog class. Now I can make full use of all the members of the class, albeit via a call through a pointer instead of directly.</p>
<p>So what do I do with these services once I receive them? Well, for each service type that I find, I kick off a new browse process, looking for specific instances of the service. Just as an example, in my callback <code>IterateServiceTypes</code>, one of the callbacks returns a service type of <code>_printer._tcp</code>. In order to find all instances of this service, I have to call <code>DNSServiceBrowse</code> again, with that service name and the correct interface and domain. After inserting the service type into the tree, I make that call so I can start adding those instances:</p>
<pre>
HTREEITEM item = p-&gt;m_Tree.InsertItem( CA2T(service_type.c_str()), TVI_ROOT, TVI_SORT );
DNSServiceRef client = NULL;
DNSServiceErrorType err;
err = DNSServiceBrowse( &amp;client,
                        0,
                        0,
                        service_type.c_str(),
                        &quot;&quot;,
                        IterateServiceInstances,
                        context );
</pre>
<p>The key point to note about this call is that the callback function, <code>IterateServiceInstances</code>, is a different member function - one that expects to get the results of my browsing for instances of a specific service.</p>
<h4>Driving the Callbacks</h4>
<p>One thing I've skipped over so far - how do these callbacks actually get generated? Does the DLL asynchronously make calls into my code whenever events occur?</p>
<p>The Bonjour SDK lets your program control when callbacks occur by giving you the handle to the message pump. When you call <code>DNSServiceProcessResult()</code> with a single argument of a <code>DNSServiceRef</code>, you will generate a single callback message for the given reference. The callback will occur within the context of the call to <code>DNSServiceProcessResult()</code>.</p>
<p>When you call <code>DNSServiceProcessResult()</code>, the Bonjour DLL will block if there are no messages ready to process. So how do you know when there are messages ready? </p>
<p>The indicator that messages are ready is given by a file descriptor associated with the <code>DNSServiceRef</code>. You can get a copy of the file descriptor by calling <code>DNSServiceRefSockFD()</code>, passing in a copy of the reference. When the file descriptor has data ready to read, you have callbacks pending. The easiest way to check this condition is to use the <code>select()</code> function, which can check multiple references in one fell swoop.</p>
<p>In my implementation of the callback message pump, I rely on an <code>unordered_map</code> called <code>m_ClientToFdMap</code> that contains a copy of all the <code>DNSServiceRef</code> references currently waiting for responses. I create the necessary data structure used by <code>select()</code>, then call it to get a list of all references that have callbacks pending. The core of this code looks like this:</p>
<pre>
int result = select(0, &amp;readfds, (fd_set*)NULL, (fd_set*)NULL, &amp;tv);
if ( result &gt; 0 ) {
//
// While iterating through the loop, the callback functions might delete
// the client pointed to by the current iterator, so I have to increment
// it BEFORE calling DNSServiceProcessResult
//
    for ( auto ii = m_ClientToFdMap.cbegin() ; ii != m_ClientToFdMap.cend() ; ) {
        auto jj = ii++;
        if (FD_ISSET(jj-&gt;second, &amp;readfds) )
            DNSServiceErrorType err = DNSServiceProcessResult(jj-&gt;first);
    }
}
</pre>
<p>This generates my callbacks efficiently, and because they are in the context of my main program's UI thread, I avoid a lot of unpleasant issues.</p>
<h4>Threading Issues</h4>
<p>My program manages the Bonjour callbacks in a fairly ugly fashion. When my browsing activity starts, I create a timer that fires once every 250 milliseconds. I process up to 10 callbacks in that timer call, then exit. This continues until there are no pending browser or resolution requests, at which time I kill the timer.</p>
<p>Depending on your use of DNS-SD, you may find that this is not as efficient as you like. If this is the case, you might find it useful to move your message pump code to a separate thread.</p>
<p>Once you do that, you can wait on all your callbacks by calling <code>select</code> with a long or infinite timeout. This has the effect of blocking your callback thread until it has actual work to do - resulting in a better use of CPU time.</p>
<p>There are some obvious downsides to this approach. Clearly you have to use some sort of locking mechanism on the data structures that are shared between your callback thread and the rest of your program. And the use of the <code>select()</code> statement with an infinite timeout is complicated by the possibility that you may be making or canceling browsing or resolution calls while your program runs.</p>
<p>A good way to deal with both of these problems is to invoke a socket-based message passing protocol between the callback thread and the other components of your program. If you restrict your interface to messages, you don't have to worry about locking access to shared data. And because you are using a socket for communications, your <code>select()</code> statement will be used to activate the thread when new messages arrive.</p>
<h4>Character Sets</h4>
<p>The days when DNS was limited to seven-bit ASCII characters are long gone. Service instances are encoded as UTF-8, and can use whatever Unicode characters they like. In the figure shown below, you can see the effects of that when I browse for instances of iTunes:<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure03.png"/></center></td>
</tr>
<tr>
<td><center>Character set problems in service instance names</center></td>
</tr>
</table>
<p></center><br />
You can see that OS X users have so-called curly quotes in their library instance names, and curly quotes are definitely outside the range of seven-bit ASCII. DNS-SD collects the names as UTF-8 encoded strings, and sends them to the console in that format.</p>
<p>By default, the Windows cmd.exe window doesn't render UTF-8 properly, but changing the code page to 65001 results in the correct rendering. </p>
<p>In my sample program, I deal with this with a two step approach. First, my program is built using the Unicode libraries, ensuring that I am able to render Unicode output properly. To conform with Microsoft's C++ paradigms, I use <code>CString</code> for all my Unicode strings, and wrap all my string literals in the <code>_T()</code> wrapper.</p>
<p>This works fine for my UI, but I can't use strings built of <code>wchar_t</code> to communicate with the Bonjour SDK - it expects eight bit characters with UTF-8 encoding. In m program I use the C++ <code>std::string</code> class everywhere where I am working with 8-bit characters that might be encoded in UTF-8. When it comes time to render one of those strings in my Unicode context, all I have to do is use the handy <code>CA2T</code> macro with the <code>CP_UTF8</code> parameter, and things work properly.</p>
<h4>Library Issues</h4>
<p>The design of the Bonjour SDK imposes some uncomfortable restrictions on you when it comes to building your C or C++ program. Because you are linking directly to code found in the library <code>dnssd.lib</code>, you have to ensure that your program and that library link against the same version of the C run time library. And for the Bonjour SDK under Windows, this means you must link with the static, multithreaded, release version of the library.</p>
<p>You'll see the problem in this right away when you create an MFC project and try to build with <code>dnssd.lib</code>. By default, the project generator will probably have you using MFC in a shared DLL, and using the Multithreaded Debug DLL version of the C libraries. When you try to build like this, you will get some unpleasant error messages:</p>
<pre>
1>LINK : warning LNK4098:
         defaultlib 'msvcrtd.lib' conflicts with use of other libs;
         use /NODEFAULTLIB:library
1>LINK : warning LNK4098:
         defaultlib 'LIBCMT' conflicts with use of other libs;
         use /NODEFAULTLIB:library
</pre>
<p>A full featured SDK would provide libraries built for multiple scenarios, and you would pick the one of your choice depending on your build parameters. But with the Bonjour SDK, you don't get this choice, so you need to ensure that your project follows a few guidelines:</p>
<ul>
<li/>Under <i>Configuration Properties/General</i>, field <i>Use of MFC</i> needs to be set to <i>Use MFC in a Static Library</i> for both debug and release builds.
<li/>Under  <i>Configuration Properties/C++/Code Generation</i>, field <i>Runtime Library</i> needs to be set to <i>Multi-threaded (/MT)</i> for both debug and release builds.
<li/>Under  <i>Configuration Properties/C++/Preprocessor</i>, field <i>Preprocessor Definitions</i> the constant <i>_DEBUG</i> needs to be changed to <i>NDEBUG</i> for Debug configurations.
</ul>
<p>To build a project that uses the SDK, you will also need to add <code>dns_sd.lib</code> to your list of linker inputs, add <code>dns_sd.h</code> to your header files, and add the appropriate directories in the configuration under <i>Configuration Properties/C++/General/</i> in field <i>Additional Include  Directories</i>, and under <i>Configuration Properties/Linker/General/</i> in field <i>Additional Include  Directories</i>.</p>
<h4>Overview Of the Demo Program</h4>
<p>I've included the full source for a project that will build with Visual Studio 10, as long as you have the Bonjour SDK installed. It browses all available services on the network and displays the information about them in a tree form. </p>
<p>The program starts by kicking off a browser for <code>_services._dns-sd._udp</code>. The results are processed in member function <code>IterateServiceTypes()</code>. As each new service type is discovered, it is added to the tree, and a call to <code>DNSServiceBrowse()</code>is made to discover all instances of that service type. The callback for that browse call is member function <code>IterateServiceInstances()</code>.</p>
<p>In <code>IterateServiceInstances()</code> I add the instance to the tree, then call <code>DNSServiceResolve()</code>. This function operates much like the browse function, but it actually gets the DNS record for the service. This record contains the host name, service port and a list of name/value pairs that a service can advertise as part of its record. You can see those values put to good work with service types like <code>_ipp._tcp</code>, in which printer parameters are exposed as part of service discovery.</p>
<p><code>ResolveInstance()</code> is the callback routine that receives the information about the service instance. The host name, port, and name/value pairs are added to the tree, and then one final call is made to a Bonjour SDK entry called <code>DNSServiceGetAddrInfo()</code>. This function resolves the IP address for the given host name. The address is stuffed into the tree in callback function <code>GetAddress()</code>.</p>
<h4>Conclusion</h4>
<p>DNS service discover is powerful tool, but Windows programmers might be put off by the lack of a nicely packaged SDK. Using this simple example program might be a good way to get comfortable with an SDK that gives you a powerful tool that provides a good multi-platform alternative to UPnP.</p>
<table border="0" cellspacing="10">
<tr>
<td>Sample program source:</td>
<td><a href="/attachments/2011/bonjour-windows/ServiceBrowser.zip">ServiceBrowser.zip</a></td>
</tr>
<tr>
<td>Sample program executable:</td>
<td><a href="/attachments/2011/bonjour-windows/ServiceBrowserExe.zip">ServiceBrowserExe.zip</a></td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Hash Functions for C++ Unordered Containers</title>
		<link>http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/</link>
		<comments>http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/#comments</comments>
		<pubDate>Sat, 03 Sep 2011 20:00:24 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=727</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/' addthis:title='Hash Functions for C++ Unordered Containers' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>The container classes included in the C++ standard library serve as good illustrations of both the strengths and the weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term useless, and documentation that makes [...]]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style" addthis:url='http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/' addthis:title='Hash Functions for C++ Unordered Containers' ><a class="addthis_button_twitter"></a><a class="addthis_button_favorites"></a><a class="addthis_button_print"></a><a class="addthis_button_facebook_like"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div><p>The container classes included in the C++ standard library serve as good illustrations of both the strengths and the weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term <i>useless</i>, and documentation that makes a mockery of the word.</p>
<p>In this article I'll illustrate how you might bump into these problems using the <code>unordered_map</code> container, as well showing you how to work past the problems. By rights this basic hash map should be the first- or second-most used container in your arsenal, but if you are less than a C++ savant, you might find yourself ditching it out of frustration.<br />
<span id="more-727"></span></p>
<h4>Hash Tables</h4>
<p>It's a little bit of an embarrassment to the C++ community that it didn't have a hash table in the standard library until <a href="http://en.wikipedia.org/wiki/C%2B%2B_Technical_Report_1" class="newpage">TR1</a> was published in 2005. In a perfect world the original standard should have contained hash map and hash set containers. But Alexander Stepanov didn't include these containers in the original Standard Template Library, and the standardization committee was reluctant to bless containers that didn't have a decent amount of mileage in the real world.</p>
<p>By 2005 there were enough non-standard implementations to allow the TR1 extension to confidently add four new template classes:</p>
<ul>
<li/><code>unordered_map</code>
<li/><code>unordered_multimap</code>
<li/><code>unordered_set</code>
<li/><code>unordered_multiset</code>
</ul>
<p>With basically the same semantics as their ordered counterparts (<code>map</code>, <code>multimap</code>, etc.) and the ideal O(1) performance afforded by hashed indexing, C++ finally had a feature that was basically table stakes for any high-level language created since the mid-1980s. </p>
<p>The container classes in general, and <code>unordered_map</code> in particular, are such a useful part of the language that we generally want to introduce them as early as possible to people learning C++. Admittedly, the underpinnings of template programming are out of the beginner's depth, but using the containers doesn't require much knowledge about templates beyond an understanding of a few syntax rules.</p>
<p>A simple piece of sample code that you might use to teach beginners how to use these containers is shown here:</p>
<pre>
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;

using namespace std;

typedef string Name;

int main(int argc, char* argv[])
{
    unordered_map&lt;Name,int&gt; ids;
    ids[&quot;Mark Nelson&quot;] = 40561;
    ids[&quot;Andrew Binstock&quot;] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout &lt;&lt; ii-&gt;first &lt;&lt; &quot; : &quot; &lt;&lt; ii-&gt;second &lt;&lt; endl;
    return 0;
}
</pre>
<p>This example works great in the classroom or in the documentation page for <code>unordered_map</code>. But the new C++ programmer runs into trouble as soon as he or she steps outside the classroom and tries to implement a real-world example. A very common change to this program on its path to the real world will be in the use of a simple class or structure to hold the person's name. To keep it simple, I'll just assume we want to keep first and last names separate, and use the built-in <code>pair</code> class to hold the name:</p>
<pre>
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;

using namespace std;

typedef pair&lt;string,string&gt; Name;

int main(int argc, char* argv[])
{
    unordered_map&lt;Name,int&gt; ids;
    ids[Name(&quot;Mark&quot;, &quot;Nelson&quot;)] = 40561;
    ids[Name(&quot;Andrew&quot;,&quot;Binstock&quot;)] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout &lt;&lt; ii-&gt;first.first
        &lt;&lt; &quot; &quot;
        &lt;&lt; ii-&gt;first.second
        &lt;&lt; &quot; : &quot;
        &lt;&lt; ii-&gt;second
        &lt;&lt; endl;
        return 0;
}
</pre>
<p>This seemingly small change generates seven errors in Visual C++, five in g++, and none of the errors point the user to the actual problem.</p>
<p>And what is the problem? It's actually a simple one: <code>unordered_map</code> doesn't know how to create a hash for the given key type of <code>std::pair<std::string,std::string></code>. Instead, the user is left to puzzle over things like this:</p>
<pre>
c:\program files\microsoft visual studio 10.0\vc\include\xfunctional(768):
 error C2440: 'type cast' :
 cannot convert from 'const Name' to 'size_t'
</pre>
<p>Or worse yet from g++:</p>
<pre>
/tmp/cc0B9FPH.o: In function `std::__detail::_Hash_code_base&lt;std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;,
std::allocator&lt;char&gt; &gt;, std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;,
std::pair&lt;std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;,
std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; const, int&gt;,
std::_Select1st&lt;std::pair&lt;std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;,
std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; const, int&gt; &gt;,
std::equal_to&lt;std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;,
std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt;,
std::hash&lt;std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;,
std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt;,
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false&gt;::_M_hash_code(
std::pair&lt;std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;,
std::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; const&amp;) const'
</pre>
<p>The g++ compiler doesn't seem to actually produce a sentence describing the error, I think it figures no human could parse it anyway.</p>
<p>The inscrutability of compiler errors in template classes is hardly something new - I was <a href="http://marknelson.us/about/stl/" class="newpage">complaining about it</a> all the way back in 1995. And the C++0x committee tried to do something about it. Had the <a href="http://en.wikipedia.org/wiki/Concepts_(C%2B%2B)" class="newpage">C++0x Concepts</a> feature been accepted, the compiler might instead have issued an error message looked more like this:</p>
<pre>
hash_test.cpp(15): type 'Name' does not have a hash function
</pre>
<p>Unfortunately, Concepts did not make it, and we are stuck with error messages that are of no help at all.</p>
<h4>RTFM</h4>
<p>There are a couple of obvious places to try to figure out what these errors mean. Google would be one, and Visual Studio's class documentation pages would be another. </p>
<p>Just as an experiment, try putting yourself in the place of a novice, and execute a search on:</p>
<pre>
error C2440: 'type cast' : cannot convert from 'const Name' to 'size_t'
</pre>
<p>or show some sophistication and change your search to:</p>
<pre>
&quot;error C2440: type cast : cannot convert from&quot; to size_t
</pre>
<p>You will find some good clues, but probably no solutions to your problem. Much of the published code deals with pre-standard hash tables using boost implementations, or early g++ hash tables, which are not going to help.</p>
<p>But let's say you eventually figure out that you need to write a hash function for your <code>Name</code> class. All you need to know are the mechanics. Where do you find out what they are?</p>
<p>The logical place to do this would be in the Visual Studio <a href="http://msdn.microsoft.com/en-us/library/bb982522.aspx" class="newpage">unordered_map documentation page</a>. This page has some good information in it, but nowhere does it address an undoubtedly common problem: <em>how do I create a user-defined hash function for unordered_map?</em>. And don't even thing about getting anything useful from the <a href="http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.2/classstd_1_1tr1_1_1unordered__map.html" class="newpage">g++ documentation page</a>.</p>
<p>These stumbling blocks are the kind of thing that give C++ a reputation as one of the most difficult languages to learn. To see the contrast, you might compare it to Java. The Java programmer won't ever run into this problem, because the language defines a default hash function in the base class <code>Object</code>. Any Java object may be used as a key in Java's <code>HashMap</code> generic class. While the base class definition may be far from optimal for many cases, it will at least work, and won't prevent the beginner from using the container in real programs.</p>
<p>C++ could have done the same thing when implementing the unordered containers, but it was constrained by both philosophy and language limitations. It would have been a real accomplishment to have overcome both, and to be honest, the people on the standards committee have to work hard enough as it is. Insisting on X-Ray vision and a cape for each member might be pushing it.</p>
<h4>One Point of Light</h4>
<p>So what is to be done? The language has structural problems that make it hard to issue good errors. Documentation is never as good as we like it. Are we stuck at the status quo?</p>
<p>While I'm not likely to change the Visual Studio documentation or the C++ compiler, I have found that one small article, like this one, that has good SEO terms to describe the problem, can help a lot of people. As an example, my <a href="http://marknelson.us/2002/03/01/next-permutation/" class="newpage">next_permutation()</a> article from 2002 still gets hundreds of readers every week, and I hope most of them walk away understanding how to use this function.</p>
<p>The same thing could end up being true for this article. I'll spend the rest of it showing you four good ways to define a hash function for use in <code>unordered_map</code> under C++0x, and with Google's help, it may end up providing the missing manual for this particular problem.</p>
<h4>Method 1 - A simple function</h4>
<p>You're used to seeing <code>unordered_map</code> declared with two template parameters. But a look at the help page shows that it actually takes five - the last three usually accept default values:</p>
<pre>
template&lt;class Key,
    class Ty,
    class Hash = std::hash&lt;Key&gt;,
    class Pred = std::equal_to&lt;Key&gt;,
    class Alloc = std::allocator&lt;std::pair&lt;const Key, Ty&gt; &gt; &gt;
    class unordered_map;
</pre>
<p>The third parameter to the definition is a function object type that is used by the class to convert a key into a hash code. By default it is set to <code>std::hash<Key></code>. Internally the <code>unordered_map</code> class calls <code>operator()</code> on an object of that type in order to get a hash code for a given key.</p>
<p>Note also that the several of constructors for <code>unordered_map</code> also take a default parameter which is an instance of this function object type. </p>
<p>So there are two places where we can provide some information about how to hash the key used in the <code>unordered_map</code>, but we normally don't fill in these items. The reason that we often don't is that the C++ standard library already defines instances of <code>std::hash&lt;T&gt;</code> for commonly used types. So I can write a program that contains a line like this:</p>
<pre>
int main(int argc, char* argv[])
{
    cout &lt;&lt; std::hash&lt;const char *&gt;()(&quot;Mark Nelson&quot;) &lt;&lt; endl;
    return 0;
}
</pre>
<p>Which when run, produces:</p>
<pre>
134514544
</pre>
<p>The problem, of course, is that the standard library did not implement a version of <code>hash&lt;pair&lt;string,string&gt;&gt;</code> - or any of the other infinite varieties of user-defined classes. However, since my new <code>Name</code> class is composed of two <code>string</code> objects, and the standard library knows how to hash <code>string</code> objects, I can create a pretty good hash function of my own that looks like this:</p>
<pre>
size_t name_hash( const Name &amp; name )
{
    return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
}
</pre>
<p>As a general rule of thumb, if I have two hashes for independent variables, and I combine them using XOR, I can expect that the resulting hash is probably just as good as the input hashes.</p>
<p>Building this function is easy enough, but how do I actually use it with <code>unordered_map</code>? Although the solution is fairly simple, if you find templates confusing already, you aren't liable fumble your way through to the answer without an enormous amount of effort.</p>
<p>Basically, I have to modify my use of the map class in two places. First, I have to pass in a pointer to the hash function in the constructor of the map. The standard defines a constructor that takes an initial number of buckets and a hashing object as inputs. So the first step is to modify the declaration code to look like this:</p>
<pre>
unordered_map&lt;Name,int&gt; ids(100, name_hash );
</pre>
<p>This is only half the battle, however. The default implementation of <code>unordered_map</code> expects to be using a function object of type <code>std::hash<key></code> to calculate hashes, and that is not what I passed in to the constructor. So I also have to add a third template parameter to my declaration - a template parameter which matches the type of the function object I am passing in to the constructor.</p>
<p>Creating the proper template parameter to match this simple hashing function requires more than your usual level of library-fu. One way I've done this in the past is to wrap the function type inside the <code>std::function</code> template, which is defined in header <code>&lt;functional&gt;</code>. When you do this, your map will have a hasher object that is an instance of <code>function</code>. Upon initialization of the map, the function object will be assigned a pointer to <code>name_hash</code>, which can then be called via the interface to <code>function</code>.</p>
<p>The resulting declaration will then look like this:</p>
<pre>
unordered_map&lt;Name,int,function&lt;size_t( const Name &amp; name )&gt;&gt; ids(100, name_hash );
</pre>
<p>C++0x gives me a slightly easier way to do this, and the code below shows this simpler alternative. By using the <code>decltype</code> keyword, I can take the type of my hash function and pass it is as a template parameter. Not only is the code simpler this way, but I avoid defining one thing in two different places.</p>
<pre>
//
// This program uses a simple user-defined function
// to provide a hash function for use in unordered_map
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;functional&gt;

using namespace std;

typedef pair&lt;string,string&gt; Name;

size_t name_hash( const Name &amp; name )
{
    return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
}

int main(int argc, char* argv[])
{
	unordered_map&lt;Name,int,decltype(&amp;name_hash)&gt; ids(100, name_hash );
	ids[Name(&quot;Mark&quot;, &quot;Nelson&quot;)] = 40561;
	ids[Name(&quot;Andrew&quot;,&quot;Binstock&quot;)] = 40562;
	for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
		cout &lt;&lt; ii-&gt;first.first
                     &lt;&lt; &quot; &quot;
                     &lt;&lt; ii-&gt;first.second
                     &lt;&lt; &quot; : &quot;
                     &lt;&lt; ii-&gt;second
                     &lt;&lt; endl;
	return 0;
}
</pre>
<h4>Method 2 - A simple function defined inline</h4>
<p>In some cases where your hash function is short and sweet, you might want to take advantage of the new C++0x support for lambda expressions. In this particular case, using a lambda to define your hash function lets you define the hasher right where you use it - which may or may not provide clarity for the program. In this particular case, the finished result looks like this:</p>
<pre>
//
// This program uses a simple user-defined function
// to provide a hash function for use in unordered_map.
// The function is passed in as a lambda expression to
// the unordered_map constructor.
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;functional&gt;

using namespace std;

typedef pair&lt;string,string&gt; Name;

int main(int argc, char* argv[])
{
    unordered_map&lt;Name,int,function&lt;size_t ( const Name &amp; name )&gt;&gt;
    ids(100, []( const Name &amp; name )
             {
                 return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
             } );
    ids[Name(&quot;Mark&quot;, &quot;Nelson&quot;)] = 40561;
    ids[Name(&quot;Andrew&quot;,&quot;Binstock&quot;)] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout &lt;&lt; ii-&gt;first.first
             &lt;&lt; &quot; &quot;
             &lt;&lt; ii-&gt;first.second
             &lt;&lt; &quot; : &quot;
             &lt;&lt; ii-&gt;second
             &lt;&lt; endl;
    return 0;
}
</pre>
<p>As far as the compiler is concerned, this program will probably generate nearly identical code to the previous one, so it is unlikely that you should prefer one over the other for reasons of efficiency. </p>
<p>I don't use the lambda expression method for two reasons:</p>
<ol>
<li/>When using the lambda expression, I can't use <code>decltype</code> in the template class definition to get the type of the hash function object. This means I have to manually enter it, which means I'm manually synching one definition between two places in my code - always an opportunity for trouble.
<li/>As this is being written in 2011, lambda expressions are still unfamiliar to people, and aren't supported in a lot of compilers currently in use on production systems, so I save their use for places where they provide a marked improvement in either program structure or readability.
</ol>
<p>Your opinions may well differ.</p>
<h4>Method 3 - A function object</h4>
<p>Function objects are a way to package up functions so they can be called in a way that is often convenient to a library writer. So even though you might not have ever dreamed up the idea of creating a function object to provide a hash function to the library, this is the way <code>unordered_map</code> prefers things. My definition of a function object to use with this definition of <code>Name</code> is shown here:</p>
<pre>
struct hash_name {
    size_t operator()(const Name &amp;name ) const
    {
        return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
    }
};
</pre>
<p>All I have done here is wrap up the hash function a class. And while that might not seem like much of a big deal, it allows a class to use this function when all it has is the class definition. When I define <code>unordered_map</code> using this function object, my code is a bit simpler:</p>
<pre>
unordered_map&lt;Name,int,hash_name&gt; ids;
</pre>
<p>You can see that I still have to include a third template class parameter, but I don't have to pass in a reference to the function object in the constructor. This is because <code>unordered_map</code> keeps track of the class definition, and when it comes time to actually get a hash, it can simply construct the object on the fly and pass in the data. A sample (hypothetical) class that does this might have code like this:</p>
<pre>
template &lt;class HashKey,
         class HashValue,
         class HashObject &gt;
class HashMap {
...
void insert( HashKey &amp;key, HashVal &amp;val )
{
    size_t index = HashObject()( key );
...
</pre>
<p>Putting together for the sample program I've been using throughout, you get this code:</p>
<pre>
//
// This program uses a function object to define
// a hash function for use in unordered_map.
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;functional&gt;

using namespace std;

typedef pair&lt;string,string&gt; Name;

struct hash_name {
    size_t operator()(const Name &amp;name ) const
    {
        return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
    }
};

int main(int argc, char* argv[])
{
    unordered_map&lt;Name,int,hash_name&gt; ids;
    ids[Name(&quot;Mark&quot;, &quot;Nelson&quot;)] = 40561;
    ids[Name(&quot;Andrew&quot;,&quot;Binstock&quot;)] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout &lt;&lt; ii-&gt;first.first
             &lt;&lt; &quot; &quot;
             &lt;&lt; ii-&gt;first.second
             &lt;&lt; &quot; : &quot;
             &lt;&lt; ii-&gt;second
             &lt;&lt; endl;
    return 0;
}
</pre>
<p>This method clearly has some nice features. In return for having to package your hash function inside a structure definition (and use the unfamiliar operator() ), you get the advantage of having an <code>unordered_map</code> declaration that is considerably simpler than the ones used in the previous two examples. </p>
<p>In terms of the generated code, this example is again probably going to generate nearly identical code to the previous examples, meaning that your preference towards using it should be based strictly on ease of coding, readability, and maintenance issues.</p>
<h4>Method 4 - Specializing std::hash</h4>
<p>When you use <code>unordered_map</code> with all the default class parameters, it tries to use a function object of type <code>std::hash&lt;Key&gt;</code> to create your hash keys. As discussed way back at the start of this program, this doesn't work in many cases because nobody bothered to create a specialization of the hash object for your specific class. In many cases, this is because your specific class had not been invented yet.</p>
<p>It stands to reason that if you are providing a class that will be used by other people, it might be nice to actually create the instances of the hash class for your class. When you do that, and include the definition in the header file that defines your class, people will be able to use your class as a key for any of the unordered containers without any additional work. </p>
<p>Defining a specialization of <code>std::hash&lt;T&gt;</code> for your class is really no different than the method 3, shown immediately above. The only difference is that you have to use the name <code>hash</code> for your object, and you have to define it as a specialization of that template class, and finally, you have to hoist the whole thing into the std namespace.</p>
<pre>
//
// This program uses a specialization of
// std::hash&lt;T&gt; to provide the function
// object needed by unordered_map.
//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;functional&gt;

using namespace std;

typedef pair&lt;string,string&gt; Name;

namespace std {
    template &lt;&gt;
        class hash&lt;Name&gt;{
        public :
            size_t operator()(const Name &amp;name ) const
            {
                return hash&lt;string&gt;()(name.first) ^ hash&lt;string&gt;()(name.second);
            }
    };
};

int main(int argc, char* argv[])
{
    unordered_map&lt;Name,int&gt; ids;
    ids[Name(&quot;Mark&quot;, &quot;Nelson&quot;)] = 40561;
    ids[Name(&quot;Andrew&quot;,&quot;Binstock&quot;)] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout &lt;&lt; ii-&gt;first.first
             &lt;&lt; &quot; &quot;
             &lt;&lt; ii-&gt;first.second
             &lt;&lt; &quot; : &quot;
             &lt;&lt; ii-&gt;second
             &lt;&lt; endl;
	return 0;
}
</pre>
<p>As you can see, for the user of your class, this is the simplest of all possible solutions - no changes are needed to get the default versions of <code>unordered_map</code> to work properly. If you are going to be using this class in multiple places, this is probably the best solution. However, you need to be sure you aren't going to pollute the std namespace with a hash function that might conflict with those used by other classes.</p>
<h4>One Last Source of Trouble</h4>
<p>In addition to requiring a hash function, the unordered containers also need to be able to test two keys for equality. The canonical way for them to do this is with a version of <code>operator==()</code> defined at the global namespace. This is typically a function you are used to having to construct when creating new classes, but if you overlook it, you will be up against the same raft of incomprehensible compiler errors seen earlier in this article. </p>
<p>I didn't have to deal with it in this article because the standard library already defines this operator for <code>std::pair<T1,T2></code>. Of course, when using <code>std::pair</code> you also have to make sure you have an equality operator for T1 and T2.</p>
<h4>Wrap-Up</h4>
<p>This article showed you four different ways to define a hash function for use with a user-defined class and the unordered C++0x containers. Different people will choose one of the four methods for personal reasons. When it comes to performance, there is probably no reason to prefer one or the other. </p>
<p>None of the techniques described here are particularly difficult to implement, but it is always a surprise and a disappointment to see how hard it is to get this information out of the standard resources.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

