<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark Nelson</title>
	<atom:link href="http://marknelson.us/feed/" rel="self" type="application/rss+xml" />
	<link>http://marknelson.us</link>
	<description>Programming, mostly.</description>
	<lastBuildDate>Wed, 01 Feb 2012 16:36:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>I&#8217;m In the Money</title>
		<link>http://marknelson.us/2012/02/01/im-in-the-money/</link>
		<comments>http://marknelson.us/2012/02/01/im-in-the-money/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 16:36:28 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[Data Compression]]></category>
		<category><![CDATA[Humor]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1437</guid>
		<description><![CDATA[It looks like all my long years of studying data compression might be ready to pay off: Hello Good Day, This is Troop Emonds With regards to your Company i am sending this email Regards to order some( Compression Machine )I will like to know the type and sizes you have in stock and get [...]]]></description>
			<content:encoded><![CDATA[<p>It looks like all my long years of studying data compression might be ready to pay off:</p>
<blockquote><p>Hello Good Day,</p>
<p>This is Troop Emonds With regards to your Company i am sending this email Regards to order some( Compression Machine )I will like to know the type and sizes you have in stock and get me the sales price of one so that i will tell you the quantity i will be ordering, and if you accept credit card as a form of payment..</p>
<p>Hope to read from you soon about my order request&#8230;&#8230;<br />
With Kind Regards.<br />
Troop</p></blockquote>
<p>I just need to put together some compression machines, and then I&#8217;m set.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/02/01/im-in-the-money/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mark&#8217;s Travel Guide to New Zealand</title>
		<link>http://marknelson.us/2012/01/28/marks-travel-guide-to-new-zealand/</link>
		<comments>http://marknelson.us/2012/01/28/marks-travel-guide-to-new-zealand/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 22:37:16 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1428</guid>
		<description><![CDATA[I recently spent a little over two weeks touring New Zealand. It was a self-driving trip, which meant we got to cover a lot of ground, although certainly the coverage was very shallow. Before this trip, I had not set foot on foreign soil more than one mile from the US border, so the experience [...]]]></description>
			<content:encoded><![CDATA[<p>I recently spent a little over two weeks touring New Zealand. It was a self-driving trip, which meant we got to cover a lot of ground, although certainly the coverage was very shallow.</p>
<p>Before this trip, I had not set foot on foreign soil more than one mile from the US border, so the experience of going to a foreign country was in itself new. This means I am compelled to share it with you.</p>
<p>Overall New Zealand made me feel very welcome. I would like to move to New Zealand. Barring that, visiting New Zealand as a tourist was a great experience.</p>
<p>I could write a detailed photoblog of our eighteen day journey, but this would be a lot like going to your Uncle&#8217;s house and watching his 90-minute DVD compilation of his trip to Norway &#8211; a bit tedious.</p>
<p>One thing I noticed as a tourist is that it is kind of hard to notice things that are there, but very easy to notice the things that are missing. So my detailed summary of the trip will give you the list of things that I noticed were <i>not</i> in New Zealand &#8211; at least not on my self-driven tour. (I&#8217;d give a link to the fine touring company if they would work out some sort of affiliate program, then you could do it yourself.)</p>
<p><b>Things that don&#8217;t appear to exist in New Zealand:</b></p>
<ul>
<li/>Stop signs
<li/>Iced tea
<li/>Pennies
<li/>Nickels
<li/>Dollar bills
<li/>Insane airport security (domestic only)
<li/>Pickup trucks
<li/>The purported monoculture of sheep
<li/>Moas
<li/>Cosmopolitans, or more generally, grown-up cocktails
<li/>Hot weather
<li/>Air-Conditioned hotel rooms
<li/>Free Wireless
<li/>Reasonably priced Wireless
<li/>Horrifying public restrooms
<li/>4 lane highways
<li/>Any sense of realism about building a country in a volcano/earthquake/tsunami free-fire zone
</ul>
<h4>Pictures and Movies</h4>
<p>A bunch of photos <a href="http://www.flickr.com/photos/snorkel58/sets/72157628844439217/" class="newpage">here</a>.<br/><br />
Some very short videos <a href="http://www.flickr.com/photos/snorkel58/sets/72157628843998801/" class="newpage">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/01/28/marks-travel-guide-to-new-zealand/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Visit With Tim Bell</title>
		<link>http://marknelson.us/2012/01/21/a-visit-with-tim-bell/</link>
		<comments>http://marknelson.us/2012/01/21/a-visit-with-tim-bell/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 02:22:50 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Data Compression]]></category>
		<category><![CDATA[People]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1407</guid>
		<description><![CDATA[I was in Christchurch, New Zealand, recently and had a chance to meet Tim for the first time in person. Tim teaches at the <a href=" http://www.canterbury.ac.nz/" class="newpage">University of Canterbury in Christchurch</a>, and is <a href="http://www.cosc.canterbury.ac.nz/tim.bell/" class="newpage">Deputy Head of the Computer Science and Software Engineering</a> department. I got a chance to ask him about his work in data compression as well as one of his new areas of interest, Computer Science education.]]></description>
			<content:encoded><![CDATA[<p><img src="/attachments/2012/bell/TimBell2.jpg" alt="Dr. Timothy Bell" align="right" style="margin-left:15px;border-style:solid;border-width:2px"><br />
In my early years of learning about data compression, the book <a href="http://books.google.com/books/about/Text_compression.html?id=sdZQAAAAMAAJ" class="newpage">Text Compression</a> by Timothy Bell, John Cleary, and Ian Witten was my resource of first resort. I was in Christchurch, New Zealand, recently and had a chance to meet Tim for the first time in person. Tim teaches at the <a href=" http://www.canterbury.ac.nz/" class="newpage">University of Canterbury in Christchurch</a>, and is <a href="http://www.cosc.canterbury.ac.nz/tim.bell/" class="newpage">Deputy Head of the Computer Science and Software Engineering</a> department. I got a chance to ask him about his work in data compression as well as one of his new areas of interest, Computer Science education.<br />
<span id="more-1407"></span></p>
<hr/>
MN: Tim, it seems like there has been a lot of interest in data compression in the Antipodes. Names that come to mind include you, John Cleary, and Peter Fenwick in New Zealand, and Ross Williams in Australia. Is this just coincidence, or is compression in the air down there?</p>
<p>TB: I’ve sometimes wonder about this myself&#8230; during the early days of computing and especially personal computers, it took some time for the latest technology to reach us “down under”, so perhaps we were motivated to get more out of what we had rather than wait some months for a larger disk or new memory to arrive from overseas. When the Internet arrived we started with a very small pipe, so a good compression algorithm could do the equivalent to laying a second cable from NZ to the US – who can resist getting something for free?</p>
<p>MN: Since you wrote Text Compression back in the early 90s, I&#8217;d say the biggest development in lossless compression has been the Burrows-Wheeler transform. Is lossless text compression basically done? Are we left with just incremental improvements as processor resources increase?</p>
<p>TB: That seems to be the case; the only big improvements we’ve seen have turned out to be frauds &#8212; we even had one in NZ recently, where a Nelson man raised NZ$5.3 million for an impressive sounding method; he was <a href="http://www.stuff.co.nz/nelson-mail/news/3892853/Whitley-found-guilty-of-fraud" class="newpage">convicted of fraud</a> last year. The main indicator we have that we’re running out of steam (apart from a lack of new discoveries) is Shannon’s experiments on predicting text which gave a bound in the order of 1 bit per character for English text, and current methods are approaching this. Of course, there’s plenty of room for dealing with new kinds of data (for example, bioinformatics deals with massive amounts of data that we’re still trying to understand) and for finding better data structures and algorithms for performing the compression and decompression. Lossy compression is a whole different story&#8230;</p>
<h4>A Change In Focus</h4>
<p>MN: It looks like you are now dedicating a large amount of your time to establishing computer science as part of the basic curriculum in high school education, for students in the 15-18 age range. In many ways, this is as much a bureaucratic problem as an academic one. What motivated you to take it on?</p>
<p>TB: It’s been a problem that we’ve complained about for decades, and it’s been getting worse and worse as computing in schools has focussed increasingly on using computers and not preparing students to be developers. A lot of this can be attributed to bureaucracy – it’s hard to explain to government officials that putting word processors in every classroom isn’t the same as building a computationally literate society. As a result of some strategic lobbying done by others, a small window of opportunity opened for me to be on a group to advise our Ministry of Education, just over 3 years ago. The group managed to convince the officials that something useful could be done, and then we had to work very quickly to come up with a concrete proposal before the enthusiasm died down.  This has happened rapidly; the advisory group first met in November 2008, and Computer Science started being taught in schools in February 2011.</p>
<p>MN: What have you been able to accomplish in New Zealand so far?</p>
<p>TB: Computer science (including programming, but also topics the involve understanding the importance of things like algorithms, HCI, programming languages and even compression) is currently available as part of computing courses for two of the three final years of our main high school graduation qualification, with all three years being covered from 2013. After that we would expect some of the introductory material to start filtering down to earlier classes, and for wider offerings as teachers become more confident in the subject. One of the biggest challenges has been preparing teachers, few of whom have significant experience in Computer Science. Many have embraced it enthusiastically, and the universities and others have done a lot of work to help them get up to  speed. It’s been a wild ride doing it so quickly, but there have been some very pleasing outcomes.</p>
<p>MN: And how do things look in the rest of the world? Are there any obvious winners and losers at this point? Do you have any concise advice for the world?</p>
<p>TB: Computing in schools is a hot topic around the world; the UK have just announced a strong drive to introduce this sort of material to schools, and the US has people working hard to make it available to students. Israel and Korea have had computer science in schools for some time. We’re learning a lot about what is worth teaching, and what the best pedagogy is for the general classroom (most of our experience is for specialist students who have chosen the subject!) The New Zealand path of getting something going quickly with grass-roots support seems to be more effective than waiting for a top-down approach which could take years to develop and prepare teachers for, although it does make for a bumpy ride as problems are ironed out as we go along!</p>
<p>MN: This might be straying out of your area a bit, but do you see CS in a K-12 education setting having an effect on the representation of women in the STEM fields?</p>
<p>TB: Attitudes that affect representation definitely start at school, and to me the biggest goal of teaching CS in high school is not so much to prepare students for further study, but to enable them to find out what the subject is! School students rarely know what CS is, and even worse, it’s common for them to assume that it must be advanced word processing or some other dull area, and hence they avoid it. It’s particularly important for female students to have the opportunity to find out if it’s something that they might be good at, as the stereotypes associated with computing can make them assume that they shouldn’t consider it as a career.</p>
<p>MN: One final question, Tim. The whole world has seen the devastating damage Christchurch has suffered from the earthquakes in the last year. How has the University of Canterbury held up? Have you managed to maintain continuity in your academic calendar?</p>
<p>TB: It’s been quite a year! Thankfully our university has escaped the brunt of the earthquakes (most of the damage is some distance from the university), and we’ve managed to keep a full programme going despite being closed for three weeks for safety checks. Many students joined the  “student volunteer army”, who helped with the cleanup in the damaged parts of town, and that was probably one of the most valuable experiences of their career! It hasn’t been without disruption as buildings need to be checked carefully, and some are still under repair, but with a bit of resourcefulness we managed to keep going (for a while I even delivered my classes in a restaurant while lecture theatres were being inspected) The city is now going through a massive program of redevelopment with some pretty creative ideas, and it’s an exciting time to be part of these changes.</p>
<hr/>
<p>
<img src="/attachments/2012/bell/New_Zealand.png" alt="New Zealand" align="left" style="margin-right:15px;border-style:solid;border-width:2px">Thanks to Dr. Bell for taking the time to share all this with us. My visit to his amazing homeland was a real treat, and the short time I got to spend with Tim in Christchurch was worth the trip all in itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2012/01/21/a-visit-with-tim-bell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streams or Iterators?</title>
		<link>http://marknelson.us/2011/12/24/streams-or-iterators/</link>
		<comments>http://marknelson.us/2011/12/24/streams-or-iterators/#comments</comments>
		<pubDate>Sat, 24 Dec 2011 18:21:11 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Data Compression]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1393</guid>
		<description><![CDATA[When I updated my LZW reference code to use the latest C++ features, I abstracted my input and output functions using templates. Data was read and written using the iostreams paradigm, which requires simple classes that implement just a few functions. Would I have been better off using the iterator paradigm instead? The C++ algorithms [...]]]></description>
			<content:encoded><![CDATA[<p>When I updated my <a href="http://marknelson.us/2011/11/08/lzw-revisited/" class="newpage">LZW</a> reference code to use the latest C++ features, I abstracted my input and output functions using templates. Data was read and written using the iostreams paradigm, which requires simple classes that implement just a few functions. Would I have been better off using the iterator paradigm instead? The C++ algorithms library favors that method of processing data, and it can be both elegant and powerful. Which of the two paradigms is the right one for data compression?<br />
<span id="more-1393"></span></p>
<h4>The Conflict</h4>
<p>General purpose data compression routines tend to be used on binary streams of data, either from files or in-memory objects. So what is the best general paradigm for input and output when compressing data? </p>
<p>You might analyze this problem by imagining that you need to write a binary copy routine. </p>
<pre>
template&lt;class INPUT_ITERATOR, class OUTPUT_ITERATOR&gt;
void bcopy( INPUT_ITERATOR input, INPUT_ITERATOR eof, OUTPUT_ITERATOR output )
{
    while ( input != eof )
        *output++ = *input++;
}
</pre>
<p>This routine is particularly nice when you are performing a simple copy using pointers to memory &#8211; the generated code should be really efficient.</p>
<p>However, the iterator paradigm doesn&#8217;t work quite as well when you want to perform a binary copy of data in a file. I can make use of iterators that almost do the job:</p>
<pre>
 std::ifstream in( &quot;input.txt&quot;, std::ios_base::binary );
 std::ofstream out(&quot;output.txt&quot;, std::ios_base::binary );
 bcopy( std::istream_iterator(in),
        std::istream_iterator(),
	std::ostream_iterator(out) );
</pre>
<p>But the bad news is that both <code>istream_iterator</code> and <code>ostream_iterator</code> use the insertion and extraction operators, which are really meant for whitespace-delimited textual data, not binary data. The copy routine shown here will not make a binary byte-for-byte copy of the input file.</p>
<p>So when using files, the stream approach seems to be the way to go:</p>
<pre>
template&lt;class INPUT_STREAM, class OUTPUT_STREAM&gt;
void bcopy( INPUT_STREAM in, OUTPUT_STREAM out )
{
    char c;
    while ( in.get(c) )
        out.put(c);
}
</pre>
<p>If my files have been opened using the <code>iostream</code> classes, you can use this binary copy function without having to write any glue code &#8211; they already support the <code>get</code> and <code>put</code> methods, so this works right out of the box.</p>
<h4>My Choice</h4>
<p>If I&#8217;ve made up my mind that my data compression routine is going to use one of these two paradigms, it means I am going to have to write some glue code. If I choose the iterator-based approach, I need the equivalent of <code>istream_iterator</code> and <code>ostream_iterator</code> for binary files &#8211; and these aren&#8217;t in the standard library. If I choose the stream-based approach, I need efficient <code>put()</code> and <code>get()</code> members for blocks of memory. In some cases <code>basic_stringstream</code> might do the job, but not in all cases.</p>
<p>After dithering around with various solutions, I tentatively opted for the stream paradigm. I found the implementation for various sources of data to be fairly simple, and the interface is easy to understand. I don&#8217;t know if it&#8217;s the perfect choice, and I&#8217;ll keep experimenting, but for now it works for me. My abstraction of the LZW code still needs a lot of work, so it&#8217;s always possible I could rethink this at a later date.</p>
<p>I&#8217;d like to hear your thoughts &#8211; is there an obvious right answer to this question?</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/12/24/streams-or-iterators/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automating Putty</title>
		<link>http://marknelson.us/2011/12/10/automating-putty/</link>
		<comments>http://marknelson.us/2011/12/10/automating-putty/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 12:11:15 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Magazine Articles]]></category>
		<category><![CDATA[Serial Communications]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=776</guid>
		<description><![CDATA[Windows users who need a command line connection to another system via telnet or SSH are big fans of PuTTY. It&#8217;s free, it has every feature you need, and it&#8217;s reliable. One thing many people would like to do is use PuTTY as a component in their program. Apparently this comes up so often enough [...]]]></description>
			<content:encoded><![CDATA[<p>Windows users who need a command line connection to another system via telnet or SSH are big fans of <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/" class="newpage">PuTTY</a>. It's free, it has every feature you need, and it's reliable. </p>
<p>One thing many people would like to do is use PuTTY as a component in their program. Apparently this comes up so often enough that there is a <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html#faq-embedding" class="newpage">FAQ entry</a> dedicated to the topic. Alas, PuTTY does not have any sort of automation interface, so this goal has always been out of reach.</p>
<p>In this article I will show you how to work around this minor shortcoming. Creating a version of PuTTY that can be driven from a Windows program turns out to be an easy task. I'll demonstrate this with a small C++ program that shows exactly how to get this versatile program to do your bidding. My solution works for C++, but the changes I make should work well with any Windows software that can properly process a few messages.<br />
<span id="more-776"></span></p>
<h4>Putting Together the Project</h4>
<p>I'm using Visual Studio 2010 to build both my program and the modified version of Putty. I created the basic outline as follows:</p>
<ol>
<li/>Use the <em>File|New|Project</em> menu item to bring up the list of available project wizards.
<li/>Select <em>MFC project</em>, and enter a project name (I used the uninspired name <em>PuttyDriver</em>.)
<li/>I don't want the default MFC settings, so in the MFC App Wizard, select the <em>Next</em> button.
<li/>On the <em>Application Type</em> page of the wizard, change the Application Type to <em>Dialog Based</em>.
<li/>The project is ready to go at this point, you can click the <em>Finish</em> button and then build your initial project.
</ol>
<p>My driver program is only going to do one thing: direct putty to connect to the host of my choice, then log in using canned credentials. The resulting UI is shown below, and I am going to leave the very minor details of creating it up to the reader.</p>
<table border="0" width="100%">
<tr>
<td><center><img src="/attachments/2011/putty/Figure01.png"></center></td>
</tr>
<tr>
<td><center>The driver program - a simple dialog-based MFC app</center></td>
</tr>
</table>
<h4>Adding Putty to the Project</h4>
<p>The next step in this process is to add the Putty components to the project. I downloaded version 0.61 of the PuTTY source from the <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html" class="newpage">download page</a> and extracted it to a separate folder. I then used Visual Studio's <em>File|Add|Existing Project</em> to add the compatible project file, <code>Putty.dsp</code>, found in <code>/Windows/MSVC/Putty</code>. Visual Studio has to convert this project to a version 10 project file, but it should do so with no problems.</p>
<p>I then right-clicked on the Putty project in Solution Explorer and renamed it to <em>AutoPutty</em>. Since this version of PuTTY will have some slightly different behavior, I don't want to confuse the executable I am creating with the real thing.</p>
<p>From Project|Project Dependencies, I set the PuttyDriver project to depend on AutoPutty - this insures that both projects get built when I build the entire solution.</p>
<p>My final change to the project is to modify the output directory for both Debug and Release versions of AutoPutty. I set the project to build the executable in the root directory of my PuttyDriver project - this will make it easy to find the executable when I need to launch it. I had to make this change in two places: <em>Properties|Configuration Properties|General|Output Directory</em> and <em>Properties|Linker|Outuput File</em>.</p>
<p>When you finally build the project, you'll find that current version of Microsoft's C++ compiler complain quite a bit about the use of functions like <code>strcpy</code> - Microsoft would like you to use safer replacement functions. You may choose to turn those errors off by defining <code>_CRT_SECURE_NO_WARNINGS</code> in the project file. While you are there, you should define <code>SECURITY_WIN32</code> as well - it is required by Windows header <code>sspi.h</code>.</p>
<p>After a successful build you should find a copy of <code>AutoPutty.exe</code> in the root directory of your project, and it should run on your system and behave just like PuTTY.</p>
<h4>Launching AutoPutty</h4>
<p>If I'm going to have a PuTTY component in my PuttyDriver program, one of the first things I need is to be able to start and stop AutoPutty. So my first step in this project is to create the code that launches the program from PuttyDriver. The code below is inserted into the handler for the Start button:</p>
<pre>
UpdateData( true );
char path[MAX_PATH];
GetCurrentDirectory(MAX_PATH, path);
if ( path[ strlen(path) - 1 ] != '\\' )
    strcat_s( path, MAX_PATH, &quot;\\&quot; );
strcat_s( path, MAX_PATH, &quot;AutoPutty.exe -ssh &quot; );
strcat_s( path, MAX_PATH, m_HostName.GetBuffer() );
PROCESS_INFORMATION pi;
ZeroMemory(&amp;pi, sizeof(pi) );
STARTUPINFO si;
ZeroMemory(&amp;si, sizeof(si) );
si.cb = sizeof(si);
if ( CreateProcess( NULL, path, NULL, NULL, NULL, NULL, NULL, NULL, &amp;si, &amp;pi ) )
{
    Sleep( 1000 );
    BringWindowToTop();
}
</pre>
<p>This code assumes that <code>AutoPutty.exe</code> is in the current directory, and launches it with a command line telling it to connect to the host named in the dialog using <a href="http://www.ietf.org/rfc/rfc4251.txt" class="newpage">ssh</a>. Assuming that you have the project set up properly, pushing the start button should now start an independent copy of AutoPutty, which will behave identically to classic PuTTY.</p>
<h4>Taking Ownership of AutoPutty</h4>
<p>At this point I can successfully launch AutoPutty, but I can't really start calling this an integrated part of my main program, PuttyDriver. All I have done is set up a launcher for a separate executable. </p>
<p>The next step in the integration process is to establish PuttyDriver as the owner of AutoPutty's main window. Most Windows programmers are familiar with the traditional parent/child relationship between windows. That relationship is well understood, but I can't use it here - it doesn't work for two top level windows.</p>
<p>Setting PuttyDriver to be the <em>owner</em> (as opposed to the parent) of AutoPutty has the following effects, as explained <a href="http://msdn.microsoft.com/en-us/library/ms632599(v=VS.85).aspx#owned_windows" class="newpage">here</a> by Microsoft:</p>
<ul>
<li/>The owned window will always be above its owner in the z-order.
<li/>The system automatically destroys the owned window when the owner is destroyed.
<li/>The owned window is hidden when the owner is minimized.
</ul>
<p>The most straightforward way to set ownership of the window is to pass the owner's handle in the call to <code>CreateWindow()</code>, which means I will now make my first modifications to the PuTTY source code. </p>
<p>There are a number of ways to pass the owner handle to AutoPutty for use in the call to <code>CreateWindow()</code>, with the most obvious being to pass it on the command line. In the interest of minimizing changes to the existing PuTTY code base, I elected to pass it by creating an environment variable that holds the owner window handle. Since a child process inherits the parent's environment, this is a no-fuss way to get the data to AutoPutty.</p>
<p>I added the following code to the end of <code>InitDialog()</code> in PuttyDriver:</p>
<pre>
CString hwnd_text;
hwnd_text.Format( &quot;%d&quot;, m_hWnd );
SetEnvironmentVariable(&quot;PUTTY_OWNER&quot;, hwnd_text );
</pre>
<p>This sets the environment variable for AutoPutty to find when it gets launched.</p>
<p>Now I come to the point where I am actually making changes to the PuTTY code. Fortunately, all of the changes needed for this program are confined to two files: <code>terminal.c</code> and <code>windows/window.c</code>. My first change is to <code>window.c</code>. This file contains the WndProc for the PuTTY window, and thus most of the rendering and control code for the GUI.</p>
<p>In order to establish the Owner/Owned relationship, I need to modify the code that calls <code>CreateWindow()</code>. I hoisted the function call into a block, added code to get the owner window handle, and inserted the handle into the call to <code>CreateWindow()</code>:</p>
<pre>
{
    HWND owner_hwnd = 0;
    char buffer[ 132 ];
    if ( GetEnvironmentVariable( &quot;PUTTY_OWNER&quot;, buffer, 132 ) )
        sscanf( buffer, &quot;%d&quot;, &amp;owner_hwnd );
    if ( owner_hwnd == 0 )
        MessageBox( NULL,
                    &quot;AutoPutty did not find the handle for the &quot;
                    &quot;owner window, this is not going to work&quot;,
                    &quot;Fail&quot;,
                    MB_OK );
    hwnd = CreateWindowEx(exwinmode, appname, appname,
                          winmode, CW_USEDEFAULT, CW_USEDEFAULT,
                          guess_width, guess_height,
                          owner_hwnd, NULL, inst, NULL);
}
</pre>
<p>At this point I've only modified one small block of code in the PuTTY source, but I'm well on my way to having it behave more like a component of PuttyDriver and less like an independent program. The ownership status means that the two programs only appear once on the taskbar, and will only appear once when you are pressing ALT-TAB to select a new active process. And they only produce a single entry in the Applications Tab of Task Manager.</p>
<h4>The Communications Link</h4>
<p>In order to achieve the automation that I am seeking, I also need to have two way communications between AutoPutty and the driver program. Since this is Windows, a natural choice for communications is to use native Windows messages. In order to do this, both programs need the Window handle of their opposite number.</p>
<p>I've already solved half of that problem through the ownership relationship established when I created the main window for AutoPutty. Now that it has set PuttyDriver as its owner window, I can get this window handle any place in the program through a simple function call:</p>
<pre>
HWND parent = GetWindow(hwnd, GW_OWNER);
</pre>
<p>But the reverse is not true - PuttyDriver does not know have a copy of the window handle for AutoPutty. </p>
<p>To remedy this situation, I added code to <code>window.c</code> that notifies its owner when it s created, and when it is destroyed. First I add this statement immediately after the call to <code>CreateWindow()</code>:</p>
<pre>
if ( owner_hwnd )
   PostMessage( owner_hwnd, WM_APP, 0, (LPARAM) hwnd );
</pre>
<p>This tells PuttyDriver that the window is created, and gives it the handle to use for communications.</p>
<p>I also need to know when the window is closed, and I have to add that code two places in <code>window.c</code> - because Putty can be shut down two different ways. </p>
<p>Normally AutoPutty will shut down in response to a windows message. When this happens, I can count on a <code>WM_CLOSE</code> message being sent to the Windows Procedure. I add this code the existing handler for <code>WM_CLOSE</code>:</p>
<pre>
if (!cfg.warn_on_close || session_closed ||
    MessageBox(hwnd,
               &quot;Are you sure you want to close this session?&quot;,
               str, MB_ICONWARNING | MB_OKCANCEL | MB_DEFBUTTON1)
    == IDOK) {
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent )
        SendMessage( parent, WM_APP, 0, 0 );
    DestroyWindow(hwnd);
}
</pre>
<p>This lets PuttyDriver know that the window has been destroyed.</p>
<p>The original PuTTY code has an alternative method of shutdown. When it receives one of several possible network events, such as a telnet connection being broken, it calls <code>PostQuitMessage()</code>. When a program shuts down this way, it doesn't issue messages to destroys its windows - it relies on the O/S to destroy the windows when the process exists. As a result, I have to make a change in <code>WinMain()</code>, the main window procedure for PuTTY. This procedure extracts the messages sent to it using <code>PeekMessage</code>, and I add some code to handle the processing when a <code>WM_QUIT</code> message is sent:</p>
<pre>
if (msg.message == WM_QUIT) {
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent )
        SendMessage( parent, WM_APP, 0, 0 );
    goto finished;	       /* two-level break */
}
</pre>
<h4>Handling the AutoPutty Lifecycle Events</h4>
<p>To keep track of the state of AutoPutty, I have to add a handler for <code>WM_APP</code> to PuttyDriver. It does two things when handling the incoming<code> WM_APP</code> event.</p>
<p>First, then handler stores the handle of the AutoPutty window - or sets the value to 0 when the window has been destroyed.</p>
<p>Second, it either enables or disables the button used to start up AutoPutty. Since this program can only manage one window at a time, I don't want to allow any inadvertent button pushes:</p>
<pre>
afx_msg LRESULT CPuttyDriverDlg::OnWmApp(WPARAM wParam, LPARAM lParam)
{
    m_PuttyWindow = (HWND) lParam;
    m_StartButton.EnableWindow( !m_PuttyWindow );
    return 0;
}
</pre>
<p>One final piece of bookkeeping is to make sure that the AutoPutty window is shut down when PuttyDriver shuts down. (The Windows documentation claims this happens automatically to owned windows, but it doesn't seem to be the case.)</p>
<pre>
void CPuttyDriverDlg::OnDestroy()
{
    CDialogEx::OnDestroy();
    if ( m_PuttyWindow )
        ::SendMessage( m_PuttyWindow, WM_CLOSE, 0, 0 );
}
</pre>
<h4>Monitoring Input Traffic</h4>
<p>Now that I have control over the lifetime of my AutoPutty window, it's time to take the next step in automation. My driver program needs to watch all the data coming in from the remote end so that it can take action on various types of input.</p>
<p>Depending on how you set up your connection, PuTTY can receive input data from a serial port, a Telnet connection, or an SSH connection. Fortunately the Windows version of PuTTY uses a standard handle-based interface to all three types of connections. The routine <code>term_data()</code> in <code>terminal.c</code> is called as data arrives, regardless of the source.</p>
<p>Since we are using the Windows API to communicate between processes, it makes sense to use the <code>WM_COPYDATA</code> message to send data to the parent program as it arrives. <code>WM_COPYDATA</code> is a good choice, as it takes care of marshalling the data between the two processes, which can add some complication to other solutions. The modified routine is shown below:</p>
<pre>
int term_data(Terminal *term, int is_stderr, const char *data, int len)
{
    HWND parent = GetWindow(hwnd, GW_OWNER);
    if ( parent ) {
        COPYDATASTRUCT cd;
        cd.dwData = (ULONG_PTR) 0xDEADBEEF;
        cd.cbData = len;
        cd.lpData = (PVOID) data;
        SendMessage( parent, WM_COPYDATA, (WPARAM) hwnd, (LPARAM) &amp;cd );
    }
</pre>
<h4>Receiving the Data</h4>
<p>To receive this messages in PuttyDriver, I simply create a handler for <code>WM_COPYDATA</code> and start grabbing the data as it arrives. One important thing to note is that because AutoPutty has to use <code>SendMessage()</code> to send the data to its parent, it has to wait for PuttyDriver to finish processing the data until it can continue. This dictates a certain style of behavior on my part.</p>
<p>There are quite a few ways to skin this cat, and I'm keeping it very simple here. I'm using a <code>deque&lt;char&gt;</code> container to hold the last 64 characters I've received. After each <code>WM_COPYDATA</code> message I received, I check to see if the current output snapshot ends in one of my trigger messages. If it does, I post the message number to myself for later processing, then return so that AutoPutty can continue its work.</p>
<p>The code I'm using here is doing something fairly simple: automating the login process by using the credentials that I've entered into the dialog box. That means the two strings I'm looking for are the login and password prompts. The resulting code is shown here:</p>
<pre>
BOOL CPuttyDriverDlg::OnCopyData(CWnd* pWnd, COPYDATASTRUCT* pCopyDataStruct)
{
    char *p = (char *) pCopyDataStruct-&gt;lpData;
    int len = pCopyDataStruct-&gt;cbData;
    if ( len &gt;= 64 ) {
        p += len - 64;
        len = 64;
        m_Snapshot.clear();
    }
    while ( len-- )
        m_Snapshot.push_front(*p++);
    m_Snapshot.resize(64);
    static const char *needles[2] = { &quot;login as: &quot;, &quot;password: &quot; };
    for ( int i = 0 ; i &lt; 2 ; i++ ) {
        int len = strlen( needles[i] );
        int j;
        for ( j = 0 ; j &lt; len ; j++ ) {
            if ( needles[i][j] != m_Snapshot[len-1-j] )
                break;
        }
        if ( j == len )
            PostMessage( WM_APP+1, i, 0 );
    }
    return TRUE;
}
</pre>
<p>There is plenty of room for improvement in this routine, much of it depending on what type of automation you are going to be using in your program. Some obvious items would include the ability to add and remove triggers as the program progresses, and regular expression matching for triggers. </p>
<h4>Driving PuTTY</h4>
<p>This login program is now complete save for one detail: I need a way to send my responses back to AutoPutty. </p>
<p>The first part of this is pretty obvious - I just need to read the data from the dialog box and post it to AutoPutty with my useful <code>WM_COPYDATA</code> command. This happens in my <code>WM_APP+1</code> handler:</p>
<pre>
afx_msg LRESULT CPuttyDriverDlg::OnWmAppPlusOne(WPARAM wParam, LPARAM lParam)
{
    UpdateData(TRUE);
    CString msg;
    switch ( wParam ) {
    case 0 :
        msg = this-&gt;m_UserId + '\r'; break;
    case 1:
        msg = this-&gt;m_Password + '\r'; break;
    }
    if ( this-&gt;m_PuttyWindow ) {
        COPYDATASTRUCT cd;
        cd.dwData = (ULONG_PTR) 0xF00DFACE;
        cd.cbData = msg.GetLength();
        cd.lpData = (PVOID) (const char *) msg;
        ::SendMessage( this-&gt;m_PuttyWindow,
                       WM_COPYDATA,
                       (WPARAM) this-&gt;m_hWnd,
                       (LPARAM) &amp;cd );
    }
    return 0;
}
</pre>
<p>Sending this data to AutoPutty is fine, but right now the program doesn't do anything with that message. The final piece of work is to add a <code>WM_COPYDATA</code> handler to window.c. </p>
<p>Simply grabbing the data is easy enough - the data structure that accompanies the message contains a pointer to the data and a value indicating its length. However, I have two problems I have to solve before the data is actually sent out to the to whatever device AutoPutty is connected to.</p>
<p>First, I have to take into account the fact that PuTTY was written to use wide characters. My driver program was built using MultiByte characters, so we have a mismatch. This means I have to do a conversion of the data from one domain to the other. This is a two step process - I call <code>MultiByteToWideChar()</code> once to determine how much space I need, then I allocate a buffer and call it again.</p>
<p>The second thing I need to do is determine what to do with the data once I've converted it. PuTTY takes all terminal input and eventually passes through a function called <code>luni_send()</code>. Calling this function directly from the Windows procedure seems to work just fine. </p>
<p>The <code>WM_COPYDATA</code> handler I created looks like this:</p>
<pre>
case WM_COPYDATA :
{
    COPYDATASTRUCT *cd = (COPYDATASTRUCT *) lParam;
    int wsize = MultiByteToWideChar( CP_ACP,
                                     MB_PRECOMPOSED,
                                     (LPCSTR) cd-&gt;lpData,
                                     cd-&gt;dwData,
                                     NULL,
                                     0 );
    wchar_t *buf = (wchar_t *) calloc( wsize+1, sizeof(wchar_t) );
    MultiByteToWideChar( CP_ACP,
                         MB_PRECOMPOSED,
                         (LPCSTR) cd-&gt;lpData,
                         cd-&gt;dwData,
                         buf,
                         wsize + 1 );
    if (term-&gt;ldisc)
        luni_send(term-&gt;ldisc, buf, wsize, 0);
    free( buf );
}
</pre>
<p>At this point I have a working program - it connects to my designated <a href="http://www.webhostingsearch.com/" class="newpage">host</a>, and sends the username and password of my choice to the host, connecting me to the system.</p>
<p>I should add a note of caution here. Automating logins is a tempting time saver, but in general this is a really bad idea. Any time you hard code credentials into a program, you open the door to all sorts of new attacks on your system.</p>
<p>In my demo program, the user has to enter a name and password, so nothing is hardcoded, but even this adds security holes to a system. I encourage you to think of this as a demonstration only.</p>
<p><center></p>
<table border="0">
<tr>
<td><center><iframe width="500" height="281" src="http://www.youtube.com/embed/3O4t9KzpKbo?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></center></td>
</tr>
<tr>
<td><center>Demo of the program in action<br/>For a better view, go to full screen and select 720p</td>
</tr>
</table>
<p></center></p>
<h4>Source Code</h4>
<p>I've included the complete source code for PuttyDriver, the MFC project that controls AutoPutty. It was built with Visual Studio 2010, so you may have a little work to do if you backport it to earlier versions. My use of language features and classes should be compatible with much earlier versions - this is all very simple code.</p>
<p>Because PuTTY is always changing, I am not redistributing a snapshot of the version I used. Instead, I'm including before and after copies of the two source files I modified: <code>window.c</code> and <code>terminal.c</code>. If you build with Putty 0.61, you should be able to drop these two files right on top of the files included with the distribution and be on your way. With later versions of PuTTY you will have to perform an intelligent merge of the changes, which I hope will be a fairly effortless process.</p>
<h4>Downloads</h4>
<ul>
<li><a href="/attachments/2011/putty/PuttyDriver.zip">PuttyDriver.zip</a>. The PuttyDriver source and project. You will need to add the PuTTY project to this solution as described in the article.
<li><a href="/attachments/2011/putty/putty.zip">putty.zip</a>. This contains the two PuTTY source files modified for this project. Both the original 0.61 source and my modified source are supplied. Executables are supplied as well, which may or may not work on your system.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/12/10/automating-putty/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Sendmail on Linux &#8211; the Easy Way</title>
		<link>http://marknelson.us/2011/12/09/sendmail-on-linux-the-easy-way/</link>
		<comments>http://marknelson.us/2011/12/09/sendmail-on-linux-the-easy-way/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 16:11:05 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=465</guid>
		<description><![CDATA[This summer I'm teaching a graduate class, Principles of UNIX, which is more or less a crash course in the Mother of All Operating Systems. One of our early topics is email on UNIX, in which I try to impart to the class just how transformative email was back in the day. For early Internet [...]]]></description>
			<content:encoded><![CDATA[<p>This summer I'm teaching a graduate class, Principles of UNIX, which is more or less a crash course in the Mother of All Operating Systems. One of our early topics is email on UNIX, in which I try to impart to the class just how transformative email was back in the day. For early Internet users (mostly UNIX users), this was an incredibly big deal.</p>
<p>Unfortunately, setting up email on a Linux or UNIX system is not quite as automatic as it once was. In our class we use mailx and sendmail as tools to send files from background processes or cron jobs - but mailx will typically not work out of the box. In this post I'll discuss how to get it working on an Ubuntu 11 system.<br />
<span id="more-465"></span></p>
<h4>Things Have changed</h4>
<p>Back in the day if you wanted to send mail, you simply found a handy <a href="http://www.webhostingsearch.com/dedicated-server.php" class="newpage">dedicated server</a> that was accepting incoming SMTP connections. There were thousands, and they were undiscriminating, so this was no big deal.</p>
<p>The invention of <a href="http://en.wikipedia.org/wiki/Spam_(electronic)" class="newpage">spam</a> ruined that. </p>
<p>Now any SMTP server you find is going to require you to go through a bit of a dance in order to authenticate and prove you are not a spammer. I started off this post with the intention of showing you how to use your gmail account to access Google's SMTP servers. The process was fairly arduous, as it involved creating a certificate authority, your own certificates, and then setting up the mail server to use this authentication.</p>
<p>While working on this, my son <a href="http://wlrs.net/" class="newpage">Joey</a> recommended that I just set up a free account on one of several email gateway providers, such as <a href="http://sendgrid.com" class="newage">SendGrid</a> or <a href="http://mailjet.com" class="newpage">MailJet</a>. Both services will let you access their servers and send up to 200 emails a day for free.</p>
<p>I took him up on it and found the process to be much simpler than using gmail, so I'll pass along the setup procedure here.</p>
<h4>Getting an Account</h4>
<p>Obviously, SendGrid is in business to get you to purchase a commercial account so you can send thousands of emails a day from your web site. Accordingly, the don't go out of their way to advertise their free plan. If you go to their <a href="https://Sendgrid.com/pricing.html" class="newpage">pricing page</a>, you will find a little tiny link to the <a href="https://sendgrid.com/user/signup" class="newpage">free plan</a> hidden at the bottom.</p>
<p>Setting up an account is easy, but SendGrid insists that you have a web site. For automatic verification they will need to find your email address on the site. I opted for an alternate provisioning plan in which I created a page on my site with the phrase "Sendgrid". </p>
<p>Once you have an account, you have free access to the SendGrid SMTP servers for up to 200 outbound messages a day. So you are ready to configure your UNIX system to take advantage of it.</p>
<h4>Ubuntu Configuration</h4>
<p>Configuring Ubuntu 11 to send email is fairly painless. Using the Ubuntu Software Center, you can locate and install two packages: postfix and bsd-mailx. During the install of postfix, you will get dropped into a debconf window asking you some basic configuration questions:<br />
<center></p>
<table border="0" width="100%">
<tr>
<td><center><img src="/attachments/2011/smtp/PostfixConfig.png"  width="90%"><center></td>
</tr>
<tr>
<td><center>Figure 1 - The initial configuration screen</center></td>
</tr>
</table>
<p></center><br />
I entered the following answers to the two questions I got hit up with:</p>
<ul>
<li/>General configuration: Internet
<li/>System mail name: dogma.net
</ul>
<p>That seemed to be all I needed for basic configuration.</p>
<h4>Postfix Configuration</h4>
<p>To configure postfix to use SendGrid was just a matter of adding a few lines to /etc/postfix/main.cf, using your SendGrid user name and password. Note that the file probably has an existing <code>relayhost</code> line, this one should replace it:</p>
<pre>
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:username:password
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = may
relayhost = [smtp.sendgrid.net]:587
</pre>
<p>After making the changes you should restart postfix so it reads the new config options. I also start watching the mail log file so I can see if there are any problems on first use:</p>
<pre>
sudo /etc/init.d/postfix restart
sudo tail -f /var/log/mail.log
</pre>
<p>A test message sent to my cell phone arrived as a text message in just one or two seconds, with the following log messages:</p>
<pre>
Jun 26 17:02:08 ubuntu postfix/pickup[21145]: 51A5E5E1DA6: uid=1000 from=&lt;mark&gt;
Jun 26 17:02:08 ubuntu postfix/cleanup[21336]: 51A5E5E1DA6: message-id=&lt;20110627000208.51A5E5E1DA6@ubuntu&gt;
Jun 26 17:02:08 ubuntu postfix/qmgr[21146]: 51A5E5E1DA6: from=&lt;mark@dogma.net&gt;, size=273, nrcpt=1 (queue active)
Jun 26 17:02:08 ubuntu postfix/smtp[21338]: 51A5E5E1DA6: to=&lt;xxxxxxxx@txt.att.net&gt;, relay=smtp.sendgrid.net[174.36.32.204]:587, delay=0.33, delays=0.04/0.02/0.23/0.04, dsn=2.0.0, status=sent (250 Delivery in progress)
Jun 26 17:02:08 ubuntu postfix/qmgr[21146]: 51A5E5E1DA6: removed
</pre>
<h4>Moving On to Better Things</h4>
<p>Now that postfix is properly configured, I can really start taking advantage of the mail infrastructure on my system. The next obvious step is to create a <code>.forward</code> in my home directory, and give it my external gmail address. That external address will now be the recipient of output from <code>cron</code> jobs, or from <code>at</code> or <code>batch</code>. It's nice to have the mail set up as in integral part of the O/S, and if you can just make it through a little bit of setup, it's all yours.</p>
<p>With a limit of 200 messages a day you can still make extensive use of outbound email for system monitoring - whether it is via text to your phone or huge messages being sent to an account used for storing log files. Either way, integral email is still a great feature, almost forty years after it first showed up in UNIX.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/12/09/sendmail-on-linux-the-easy-way/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>VC++ 10 Hash Table Performance Problems</title>
		<link>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/</link>
		<comments>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 14:05:45 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Complaining]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1347</guid>
		<description><![CDATA[Microsoft's implementation of <code>unordered_map</code> in Visual Studio 10 has performance issues so severe it may be unusable in your projects.]]></description>
			<content:encoded><![CDATA[<p>Microsoft has never been a slacker in the C++ department - they've always worked hard to provide a top-notch, compliant product. Visual Studio 10 supports their current incarnation, and for the most part it is up to their usual standards. It's a great development environment, and I am a dedicated user, but I have to give Microsoft a demerit in one area: their C++11 hash containers have some serious performance problems - so much that the Debug versions of the containers may well be unusable in your application.<br />
<span id="more-1347"></span></p>
<h4>Background</h4>
<p>I first noticed the problem with <code>unordered_map</code> when I was working on the the code for my updated <a href="http://marknelson.us/2011/11/08/lzw-revisited/" class="newpage">LZW article</a>. I found that when running in the debugger, my program would hang after exiting the compression routine. A little debugging showed that the destructor for my hash table was taking a long time to run. And by a long time, I mean it was approaching an <i>hour</i>!.</p>
<p>This seemed pretty crazy. Destroying a hash table wouldn't seem to be a complicated task. I decided to see if I could come up with a reasonable benchmark. I wrote a test program that does a simple word frequency count. As a starter data set, I used the first one million white space delimited words in the 2010 CIA factbook, as published by <a href="http://www.gutenberg.org/ebooks/35830.txt.utf8" class="newpage">Project Gutenberg</a>. This data set yields 74,208 unique tokens.</p>
<p>I wrote a simple test rig that I used to test the word count program using four different containers:</p>
<ul>
<li/><code>unordered_map</code> indexed by <code>std::string</code>
<li/><code>unordered_map</code> indexed by <code>std::string *</code>
<li/><code>map</code> indexed by <code>std::string</code>
<li/><code>map</code> indexed by <code>std::string *</code>
</ul>
<p>The reason for testing with <code>std::string *</code> was to reduce the cost of copying strings into the hash table as it was filled, and then to reduce the cost of destroying those strings when the table was destroyed.</p>
<p>I ran tests against <code>map</code> expecting to see a pretty big difference in performance. Because <code>map</code> is normally implemented using a balanced binary tree structure, it has O(log(N)) performance on insertions. A sparsely populated hash table can have O(1) performance. By using fairly large data sets, I expected to see a big difference between the two.</p>
<p>I tried to eliminate a few obvious sources for error in my test function - and I used a template function so that I could use the same code on all the different container types:</p>
<pre>
template&lt;class CONTAINER, class DATA&gt;
void test( const DATA &amp;data, const char *test_name )
{
  std::cout &lt;&lt; &quot;Testing container: &quot; &lt;&lt; test_name &lt;&lt; std::endl;

#ifdef _DEBUG
  const int passes = 2;
#else
  const int passes = 10;
#endif
  double fill_times = 0;
  double delete_times = 0;
  size_t entries;
  for ( int i = 0 ; i &lt; passes ; i++ ) {
    CONTAINER *container = new CONTAINER();
    std::cout &lt;&lt; &quot;Filling... &quot; &lt;&lt; std::flush;
    clock_t t0 = clock();
    for ( auto ii = data.begin() ; ii != data.end() ; ii++ )
      (*container)[*ii]++;
    double span = double(clock() - t0)/CLOCKS_PER_SEC;
    fill_times += span;
    entries = container-&gt;size();
    std::cout &lt;&lt; &quot; &quot; &lt;&lt; span &lt;&lt; &quot; Deleting... &quot; &lt;&lt; std::flush;
    t0 = clock();
    delete container;
    span = double(clock() - t0)/CLOCKS_PER_SEC;
    delete_times += span;
    std::cout &lt;&lt; span &lt;&lt; &quot; &quot; &lt;&lt; std::endl;
  }
  std::cout &lt;&lt; &quot;Entries: &quot; &lt;&lt; entries
            &lt;&lt; &quot;, Fill time: &quot; &lt;&lt; (fill_times/passes)
            &lt;&lt; &quot;, Delete time: &quot; &lt;&lt; (delete_times/passes)
            &lt;&lt; std::endl;
}
</pre>
<p>I didn't go overboard when it came to instrumenting this problem, I just used the timing functions built into the C++ library. On my Windows and Linux test systems, the values of CLOCKS_PER_SEC are both high enough that I'm not worried about granularity issues.</p>
<h4>The First Results</h4>
<p>I ran my test program in Visual C++ Release mode, using all the standard settings for a console application. For purposes of comparison, I ran the same program using g++ 4.6.1 on the same computer, booted up under Linux. For the set of 1,000,000 tokens, the results are shown below:</p>
<table border="1" cellpadding="5">
<thead>
<tr>
<th>Task</th>
<th>VC++ 10 Release</th>
<th>g++ 4.6.1 -O3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fill <code>unordered_map&lt;string&gt;</code></td>
<td>0.41s</td>
<td>.11s</td>
</tr>
<tr>
<td>Fill <code>unordered_map&lt;string const *&gt;</code></td>
<td>0.39s</td>
<td>0.14s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string&gt;</code></td>
<td>3.17s</td>
<td>0.01s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string const *&gt;</code></td>
<td>3.24s</td>
<td>0.004s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string&gt;</code></td>
<td>0.83s</td>
<td>.53s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string const *&gt;</code></td>
<td>0.88s</td>
<td>0.66s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string&gt;</code></td>
<td>.14s</td>
<td>0.01s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string const *&gt;</code></td>
<td>.07s</td>
<td>0.002s</td>
</tr>
</tbody>
</table>
<p>There are a few interesting points to take away from these tests:</p>
<ul>
<li/>Microsoft's compiler is taking an exceptionally long time to destroy hashed containers - one order of magnitude greater than it took to create it, and two orders of magnitude greater than it takes g++ to do the same task.
<li/>It doesn't look like constructing and destroying the strings is a big factor. Both compilers have roughly the same performance with both <code>std::string</code> and <code>std::string *</code>. Microsoft's behavior is counterintuitive, as it takes longer to construct and destroy containers using the pointer.
<li/>The GNU compiler appears to be able to run through this exercise notably faster.
</ul>
<p>The time it takes to destroy the table is a concern - having a C++ program hang for over 3 seconds to destroy a modestly large data structure is a serious concern - particularly when the same task completes in a few milliseconds with g++.</p>
<h4>The Pathological Results</h4>
<p>These concerns are nothing compared to what I see when running in debug mode. Setting my Visual Studio project to Debug mode, then running the same test, yields the results shown here:</p>
<table border="1" cellpadding="5">
<thead>
<tr>
<th>Task</th>
<th>VC++ 10 Debug</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fill <code>unordered_map&lt;string&gt;</code></td>
<td>17.41s</td>
</tr>
<tr>
<td>Fill <code>unordered_map&lt;string const *&gt;</code></td>
<td>17.08s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string&gt;</code></td>
<td>505.36s</td>
</tr>
<tr>
<td>Destroy <code>unordered_map&lt;string const *&gt;</code></td>
<td>505.99s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string&gt;</code></td>
<td>13.29s</td>
</tr>
<tr>
<td>Fill <code>map&lt;string const *&gt;</code></td>
<td>13.15s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string&gt;</code></td>
<td>0.94s</td>
</tr>
<tr>
<td>Destroy <code>map&lt;string const *&gt;</code></td>
<td>0.18s</td>
</tr>
</tbody>
</table>
<p>Those numbers are hard to believe. Destroying a hash table takes one millisecond when using g++. In VC++ 10, it takes almost ten minutes!</p>
<p>Worse, we suddenly see that hashed containers are <i>slower</i> than the containers built on red-black trees. Again, this just doesn't make sense.</p>
<p>The big problem with these numbers is that it means the debug mode of the compiler is effectively unusable for a lot of tasks. Regardless of how much testing it does, when it is this slow, it is just not useful.</p>
<h4>A Workaround</h4>
<p>I didn't invest the time to try debugging Microsoft's library, so I don't really know where the time is being spent. I did try a few things to speed things up, and I found one technique that helps a lot. Before including any Microsoft header files, try entering this single line in your source:</p>
<pre>
#define ITERATOR_DEBUG_LEVEL 0
</pre>
<p>With this definition in place, the delete times return to ball park of the times seen when running in release mode. Of course, you give up some debugging. I believe that an explanation of what this macro does might be found <a href="http://blogs.msdn.com/b/vcblog/archive/2011/04/05/10150198.aspx" class="newpage">here</a>.</p>
<p>In the final analysis, I think Microsoft has some serious work to to do here. The performance of their hashed containers, and to some lesser extent, the pre-C++11 associative containers, needs some serious examination. If the library is going to run this much slower than the competition, I need a good explanation why.</p>
<h4>Source</h4>
<p><a href="/attachments/2011/msvc_hash/HashTest.cpp">HashTest.cpp</a></p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>LZW Revisited</title>
		<link>http://marknelson.us/2011/11/08/lzw-revisited/</link>
		<comments>http://marknelson.us/2011/11/08/lzw-revisited/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 15:21:41 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Data Compression]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=1056</guid>
		<description><![CDATA[In this updated look at LZW, I will first give a description of how LZW works, then describe the core C++ code that I use to implement the algorithm. I'll then walk you through the use of the algorithm with a few varieties of I/O. Finally, I'll show you some benchmarks and go over the history of this well-known compression algorithm.]]></description>
			<content:encoded><![CDATA[<p>One of the first articles I wrote for Dr. Dobb's Journal, <a href="http://marknelson.us/1989/10/01/lzw-data-compression/" class="newpage">LZW Data Compression</a>, turned out to be very popular, and still generates a fair amount of traffic and email over twenty years later.</p>
<p>One of the reasons for its popularity seems to be that LZW compression is a popular homework assignment for CS students around the world. And that audience sometimes found the article to be bit of a struggle. My code was modeled on the UNIX <a href="http://en.wikipedia.org/wiki/Compress" class="newpage">compress program</a>, which was written in terse C for maximum efficiency. And sometimes optimization comes at the expense of comprehension.</p>
<p>By using C++ data structures I can model the algorithm in a much more straightforward way - the language doesn't get in the way of a clear implementation. And after 20 years of answering puzzled queries I think I can improve on the overall explanation of just how LZW works. </p>
<p>In this updated look at LZW, I will first give a description of how LZW works, then describe the core C++ code that I use to implement the algorithm. I'll then walk you through the use of the algorithm with a few varieties of I/O. Finally, I'll show you some benchmarks.<br />
<span id="more-1056"></span><br />
I'm hoping that this version of the article will be good enough to last for another 20 years.</p>
<h4>LZW Basics</h4>
<p>LZW compression works by reading a sequence of <em>symbols</em>, grouping the symbols into <em>strings</em>, and converting the strings into <em>codes</em>. Because the codes take up less space than the strings they replace, we get compression.</p>
<p>My implementation of LZW uses the C++ <code>char</code> as its symbol type, the C++ <code>std::string</code> as its string type, and <code>unsigned int</code> as its code type.  The tables of codes and strings are implemented using <code>unordered_map</code>, the C++ library's hash table data structure. By using the native types and standard library data structures the representation in the program is straightforward and easy to follow.</p>
<h4>Encoding/Decoding</h4>
<p>Rather than jumping directly into a full implementation, I'm going to work my way up to LZW one step at a time.</p>
<p>The first step is getting a clear understanding of how the encoding and decoding process works. As I said earlier, LZW compression converts strings of symbols into integer codes. Decompression converts codes back into strings, returning the same text that we started with.</p>
<p>LZW is a greedy algorithm - it tries to find the longest possible string that it has a code for, then outputs that string. The code below is not quite LZW, but it shows you the basic idea of how a greedy encoder can work:</p>
<pre>
void encode( input_stream in, output_stream out )
{
  //
  // This hash table contains a list of codes, indexed
  // by the string that corresponds to the code.
  //
  std::unordered_map&lt;std::string,unsigned int&gt; codes;
  //
  // There is presumably some code here that initializes
  // the dictionary with a set of codes based on whatever
  // algorithm we are implementing.
  //
  ...initialize the dictionary
  //
  // With codes in the dictionary, encoding is
  // now ready to begin.
  //
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  out &lt;&lt; codes[current_string];
}
</pre>
<p>The greedy encoder reads characters in from the uncompressed stream, and appends them one by one to the variable <code>current_string</code>. Each time it lengthens the string by one character, it checks to see if it still has a valid code for that string in the dictionary.</p>
<p>This continues until we eventually add a character that forms a string that isn't in the dictionary. So we then erase the last character from that string, and issue the code for the resulting string - the string from the previous pass through the loop. </p>
<p>The value of <code>current_string</code> is then initialized with the character that broke the camel's back, and the algorithm continues in the loop, building new strings until it runs out of input characters. At that point it outputs the last remaining code and exits.</p>
<p>As an example of how this would work, imagine I have the input stream <code>ACABCA</code>, and my code dictionary looks like this:<br />
<center></p>
<table border="1">
<tr>
<td>String</td>
<td>Code</td>
</tr>
<tr>
<td>A</td>
<td>1</td>
</tr>
<tr>
<td>B</td>
<td>2</td>
</tr>
<tr>
<td>C</td>
<td>3</td>
</tr>
<tr>
<td>AB</td>
<td>4</td>
</tr>
<tr>
<td>ABC</td>
<td>5</td>
</tr>
</table>
<p>A sample dictionary<br />
</center><br />
If you follow the algorithm above, you'll see that the code output has to be <code>1 3 5 1</code>. If this wasn't a greedy algorithm, <code>1 3 4 3 1</code> would have been another valid output.</p>
<p>Decoding the stream in a system like this is very straightforward:</p>
<pre>
void decode( input_stream in, output_stream out )
{
  std::unordered_map&lt;unsigned int,std::string&gt; strings;
  //
  // Initialize the code table with the same set of codes and strings
  // that the encoder used for your algorithm.
  //
  ...initialize the dictionary
  //
  // With codes in the dictionary, decoding is now
  // ready to begin.
  //
  unsigned int code;
  while ( in &gt;&gt; code )
    out &lt;&lt; strings[code];
}
</pre>
<p>Remember, the decoder shown above is just a hypothetical sample - we're still working our way up to the full LZW decoder.</p>
<h4>The LZW Encoder</h4>
<p>The encoder shown above works okay, but there is one missing ingredient: management of the code dictionary. If you think about it, you'll see that we only achieve reasonable compression when we are able to build up longer strings and find them in the dictionary. Building a useful dictionary is referred to in the data compression world as <em>modeling</em>.</p>
<p>But our management of the dictionary is constrained by an important requirement: the encoder and decoder both have to be working with the same copy of the dictionary. If they have different dictionaries, the encoder might send a string that the decoder can't resolve.</p>
<p>Some data compression algorithms solve this problem by using a predefined dictionary that both the encoder and the decoder know in advance. But LZW builds a dictionary on the fly, using an <em>adaptive</em> method that ensures both the encoder and decoder are in sync.</p>
<p>LZW manages this in an effective and provably correct fashion. First, both the encoder and decoder initialize the dictionary with all possible single digit strings. For the compressor, that looks like this:</p>
<pre>
for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,(char)i)] = i;
</pre>
<p>This insures that we can encode all possible streams. No matter what, we can always break a stream down into single digits and encode these, knowing that the decoder has the same strings in its dictionary with values 0-255.</p>
<p>Then comes the key component of the LZW algorithm. If you go back to the greedy encoding loop above, you'll see that I keep adding input symbols to a string until I find a string that isn't in the dictionary. This string has the characteristic of being composed of a string that currently exists in the dictionary, with one additional character.</p>
<p>LZW then takes that new string and adds it to the dictionary, creating a new code. The strings are added to the table with code values that increment by one with each new entry.</p>
<p>The resulting code is just a slightly modified version of the encoder that I listed above. It still only outputs codes for values that are in the dictionary, but now the dictionary is being updated with a new string every time an existing code is sent:</p>
<pre>
void compress( input_stream in, output_stream out )
{
  std::unordered_map&lt;std::string,unsigned int&gt; codes;
  for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,(char)i)] = i;
  unsigned int next_code = 257;
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      codes[ current_string ] = next_code++;
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  out &lt;&lt; codes[current_string];
}
</pre>
<p>The code above constitutes a more or less complete LZW encoder. I've only made a couple of additions to the previous encoder:</p>
<ul>
<li/>The initialization of codes 0-255 with all possible single character strings.
<li/>The insertion of the newly discovered string into the string table, generating a new code.
</ul>
<p>(One item of note in this code: you might wonder why <code>next_code</code> is initialized to 257, when 256 is the first free code. This is because I reserve code 256 for an EOF marker. More on this in a later section.)</p>
<p>Just to make sure this all adds up, I'll walk through the steps the encoder takes as it processes a string from a simple two letter alphabet: <code>ABBABBBABBA</code>. There are a lot of steps shown below, but working through the process in detail is a great way to be sure you understand it:<br />
<center><br />
<table border="1">
<tr>
<th>Input<br/>Symbol</th>
<th>Action(s)</th>
<th>New<br/>Code
<th>Output<br/>Code</th>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'A'<br/>'A' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is not in the dictionary, add it with code 257<br/>output the code for 'A' - 65<br/>set current_string to 'B'</td>
<td valign="top">257 (AB)</td>
<td valign="top">65 (A)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'BB'<br/>'BB' is not in the dictionary, add it with code 258<br/>output the code for 'B' - 66<br/>set current_string to 'B'</td>
<td valign="top">258 (BB)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BA'<br/>'BA' is not in the dictionary - add it with code 259<br/>output the code for 'B' - 66<br/>set current_string to 'A'</td>
<td valign="top">259 (BA)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'ABB'<br/>'ABB' is not in the dictionary - add it with code 260<br/>output the code for 'AB' - 257<br/>set current_string to 'B'</td>
<td valign="top">260 (ABB)</td>
<td valign="top">257 (AB)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'BB'<br/>'BB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BBA'<br/>'BBA' is not in the dictionary - add it with code 261<br/>output the code for 'BB' - 258<br/>set current_string to 'A'</td>
<td valign="top">261 (BBA)</td>
<td valign="top">258 (BB)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'ABB'<br/>'ABB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'ABBA'<br/>'ABBA' is not in the dictionary - add it with code 262<br/>output the code for 'ABB' - 260<br/>set current_string to 'A'</td>
<td valign="top">262 (ABBA)</td>
<td valign="top">260 (ABB)</td>
</tr>
<tr>
<td valign="top"><center>EOF</center></td>
<td>end of the input stream - exit loop<br/>current string is 'A'<br/>output the code for 'A' - 65</td>
<td>&nbsp;</td>
<td>65 (A)</td>
</tr>
</table>
<p></center><br />
After processing string <code>ABBABBBABBA</code>, the output codes are <code> 65,66,66,257,258,260,65</code>. The dictionary at this point is:<br />
<center></p>
<table border="1">
<tr>
<td>String</td>
<td>Code</td>
</tr>
<tr>
<td>AB</td>
<td>257</td>
</tr>
<tr>
<td>BB</td>
<td>258</td>
</tr>
<tr>
<td>BA</td>
<td>259</td>
</tr>
<tr>
<td>ABB</td>
<td>260</td>
</tr>
<tr>
<td>BBA</td>
<td>261</td>
</tr>
<tr>
<td>ABBA</td>
<td>262</td>
</tr>
</table>
<p>The dictionary generated for <code>ABBABBBABBA</code><br/>(Entries 0-255 not shown for brevity)<br />
</center><br />
Looking at the above table, you can see a few interesting things happening. First, every time the algorithm outputs a code, it also adds a new code to the dictionary.</p>
<p>More importantly, as the dictionary grows, it starts to hold longer and longer strings. And the longer the string, the the more compression we can get. If the algorithm starts emitting integer codes for strings of length 10 or more, there is no doubt that we are going to get good compression.</p>
<p>As an example of how this works on real data, here are some entries from the dictionary created when compressing <em>Alice's Adventures in Wonderland</em>:</p>
<pre>
34830 : 'even\n'
34831 : '\nwith t'
34832 : 'the dr'
34833 : 'ream '
34834 : ' of Wo'
34835 : 'onderl'
34836 : 'land'
34837 : 'd of l'
34838 : 'long ag'
34839 : 'go:'
</pre>
<p>These strings have an average length of almost six characters. If we are writing the integer codes to a file using 16 bit binary integers, these entries offer the possibility of 3:1 compression.</p>
<p>The word <em>adaptive</em> is used to describe a compression algorithm that adapts to the type of text it is processing. LZW does an excellent job of this. If a string is seen repeatedly in the text, it will show up in longer and longer entries in the dictionary. If a string is seen rarely, it will not be the foundation for a large batch of longer strings, and thus won't waste space in the dictionary.</p>
<h4>The LZW Decoder</h4>
<p>The change made to the basic encoder to accommodate the LZW algorithm was really very simple. One small batch of code that initializes the dictionary, and another few lines of code to add every new unseen string to the dictionary.</p>
<p>As you might suspect, the changes to the decoder will be fairly simple as well. The first change is that the dictionary must be initialized with the same 256 single-symbol strings that the encoder uses.</p>
<p>Once the decoder starts running, each time it reads in a code, it must add a new value to the dictionary. And what is that value? The entire content of the previously decoded string, plus the first letter of the currently decoded string. This is exactly what the encoder does to create a new string, and the decoder must following the same steps:</p>
<pre>
void decompress( input_stream in, output_stream out )
{
  std::unordered_map&lt;unsigned int,std::string&gt; strings;
  for ( int unsigned i = 0 ; i &lt; 256 ; i++ )
    strings[i] = std::string(1,i);
  std::string previous_string;
  unsigned int code;
  unsigned int next_code = 257;
  while ( in &gt;&gt; code ) {
    out &lt;&lt; strings[code];
    if ( previous_string.size() )
      strings[next_code++] = previous_string + strings[code][0];
    previous_string = strings[code];
  }
}
</pre>
<p>I won't do a walk-through of the the decoder - you should be able to take the codes output from the encoder, shown above, and run them through the decoder to see that the output stream is what we expect.</p>
<p>The important thing is to understand the logic behind the decoder. When the encoder encounters a string that isn't in the dictionary, it breaks it into two pieces: a root string and an appended character. It outputs the code for the root string, and adds the root string + appended character to the dictionary. It then starts building a new string that starts with the appended character.</p>
<p>So every time the decoder uses a code to extract a string from the dictionary, it knows that the first character in that string was the appended character of the string just added to the dictionary by the encoder. And the root of the string added to the dictionary? That was the <em>previously</em> decoded string. This line of code implements that logic:</p>
<pre>
    strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>It adds a new string to the dictionary, composed of the previously seen string, and the first character of the current string. Thus, the decoder is adding strings to the dictionary just one step behind the encoder.</p>
<p>You might note one curious point in the decoder. Instead of always adding the string to the dictionary, it is only done conditionally:</p>
<pre>
if ( previous_string.size() )
  strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>The only time that <code>previous_string.size()</code> is 0 is on the very first pass through the loop. And on the first pass through the loop, we don't have a previous string yet, so the decoder can't build a new dictionary entry. Again, the decoder is always one step behind the encoder, which is a key point in the next section, which puts the final touches on the algorithm.</p>
<h4>The Catch</h4>
<p>So far the LZW algorithm we've seen seems very elegant - that's a characteristic we associate with algorithms that can be expressed in just a few lines of code.</p>
<p>Unfortunately, there is one small catch in this perceived elegance - the algorithm as I've shown it to you has a bug.</p>
<p>The bug in the algorithm relates to the fact that the encoder is always one step ahead of the decoder. When the encoder adds a string with code <em>N</em> to the table, it sends enough information to the decoder to allow the decoder to figure out the value of the string denoted by code <em>N-1</em>. The decoder won't know what the value of the string corresponding to code <em>N</em> is until it receives code <em>N+1</em>.</p>
<p>This makes sense if you recall the key line of code from the decoder. It calculates the value of the string encoded by <em>N-1</em> by looking at the string it received on the previous iteration, plus the first character of the current string. And that current string is the one that was sent after encoding <em>N</em>.</p>
<p>So how can this get us in trouble? The encoder is always one entry ahead of the decoder - it has entry <em>N</em> in its dictionary, and the decoder has entry <em>N-1</em>. So if the encoder ever sends code <em>N</em>, the decoder will look in its table and come up empty-handed, unable to do its job of decoding.</p>
<p>A simple example will show you how this can happen. Let's look at the state of the encoder after it has sent the first five symbols in a stream: <code>ABABA</code>:</p>
<p><center><br />
<table border="1">
<tr>
<th>Input<br/>Symbol</th>
<th>Action(s)</th>
<th>New<br/>Code
<th>Output<br/>Code</th>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'A'<br/>'A' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is not in the dictionary, add it with code 257<br/>output the code for 'A' - 65<br/>set current_string to 'B'</td>
<td valign="top">257 (AB)</td>
<td valign="top">65 (A)</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'BA'<br/>'BA' is not in the dictionary, add it with code 258<br/>output the code for 'B' - 66<br/>set current_string to 'A'</td>
<td valign="top">258 (BA)</td>
<td valign="top">66 (B)</td>
</tr>
<tr>
<td valign="top"><center>B</center></td>
<td>read 'B' - set current_string to 'AB'<br/>'AB' is in the dictionary, so continue</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top"><center>A</center></td>
<td>read 'A' - set current_string to 'ABA'<br/>'ABA' is not in the dictionary, add it with code 259<br/>output the code for 'AB' - 257<br/>set current_string to 'A'</td>
<td valign="top">259 (ABA)</td>
<td valign="top">257 (AB)</td>
</tr>
</table>
<p></center><br />
Now we are set for trouble. The encoder has symbol 259 in its dictionary, while the decoder has only gotten to 258. If the encoder were to send a code of 259 for its next output, the decoder would not be able to find it in its dictionary. Can this happen?</p>
<p>Yes, if the next two characters in the stream are <code>BA</code>, the next code output by the encoder will be 259, and the decoder will be lost.</p>
<p>In general, this can happen when a dictionary entry exists that consists of a string plus a character, and the encoder encounters the sequence <code>string+character+string+character+string</code>. In the example above, the value of <em>string</em> is <code>A</code>, and the value of <em>character</em> is <code>B</code>. After the encoder counters <code>AB</code>, it has <code>string+character</code> in the dictionary, so if the following sequence is <code>ABABA</code>, we will emit code <em>N</em>.</p>
<p>Whether this is likely to happen or not is not too important, what is important is that it most definitely can happen, and the decoder has to be aware of it. And it will happen repeatedly in the pathological case: a stream that consists of a single symbol, repeated on end.</p>
<p>The good news is that the problem is easily solved. When the decoder receives a code, and finds that this code is not present in its dictionary, it knows right away that the code must be the one that it will add next to its decoder. And because this only happens when we are encoding the sequence discussed above, the decoder knows that instead of using this value for that code:</p>
<pre>
    strings[next_code++] = previous_string + strings[code][0];
</pre>
<p>it can instead use this value:</p>
<pre>
    strings[ code ] = previous_string + previous_string[0];
</pre>
<p>The result of this is the insertion of just two lines of code at the start of the decompress loop, giving a loop that now looks like this:</p>
<pre>

while ( in &gt;&gt; code ) {
  if ( strings.find( code ) == strings.end() )
    strings[ code ] = previous_string + previous_string[0];
  out &lt;&lt; strings[code];
  if ( previous_string.size() )
    strings[next_code++] = previous_string + strings[code][0];
  previous_string = strings[code];
}
</pre>
<p>And with that, you have a complete implementation of the LZW encoder and decoder.</p>
<h4>Implementation</h4>
<p>Now that I've shown you the algorithm, the next step is to take that code and add turn it into a working program. Without changing the algorithm itself, I'm going to take you through four different customizations that work as follows:</p>
<ul>
<li/>LZW-A reads and writes code values rendered in text mode, which is great for debugging. It means you can view the output of the encoder in a text editor.
<li/>LZW-B reads and writes code values as 16-bit binary integers. This is fast and efficient, and usually results in significant data compresion.
<li/>LZW-C reads and writes code values as N-bit binary integers, where N is determined by the maximum code size. Performing I/O on codes that are not aligned on byte boundaries complicates the code somewhat, but allows for greater efficiency and better compression.
<li/>LZW-D reads and writes code values as variable-length binary integers, starting with 9-bit codes and gradually increasing as the dictionary grows. This gives the maximum compression.
</ul>
<p>Before launching into these implementations, the code I showed above needs some minor tweaking to solve a couple of problems.</p>
<p>The first problem we have to deal with is the ever-expanding dictionary. In the algorithm I've presented, we keep adding new codes to the dictionary without end. This needs to be changed for a couple of reasons.</p>
<p>First, we don't have unlimited memory, so the dictionary simply can't grow forever. Second, practical experience shows that compression ratios don't improve as dictionary sizes grow without bound. As the dictionary grows, code sizes get larger and larger, and so they take up more space in the compressed stream, which can reduce compression efficiency. </p>
<p>To resolve this problem, I just add an additional argument to the encoder and decoder that sets the maximum code value that will be added to the dictionary. The function signatures now look like this:</p>
<pre>
void compress( input_string input,
               output_stream output,
               const unsigned int max_code = 32767 );
void decompress( input_string input,
                 output_stream output,
                 const unsigned int max_code = 32767 );
</pre>
<p>Implementing it means one small change in the encoder:</p>
<pre>
if ( next_code &lt;= max_code )
  codes[ current_string ] = next_code++;
</pre>
<p>And a corresponding change in the decoder:</p>
<pre>
if ( previous_string.size() &#038;& next_code &lt;= max_code )
  codes[ current_string ] = next_code++;
</pre>
<h4>Input and Output</h4>
<p>Finally, I need to give the algorithm a decent way to perform input and output - and this is where C++ offers a huge amount of help.</p>
<p>When writing generic compression code that you intend to use in multiple contexts, one of the more difficult things to deal with is I/O. People using your code might want to compress data in memory, stored in files, or streaming in from sockets or other sources. Some input data sources might be of unknown length (data coming from a TCP socket, for example), while others will be of a prescribed length. Back in the days of C, it was particularly difficult to make your compression code both generic, so it would work with all types of data streams, and efficient, so that I/O doesn't take any more time than it has to.</p>
<p>With the advent of C++, we have a new tool that can help in this quest - templates. Templates are designed to solve this problem in an efficient way, and I take advantage of this in my sample code. The code below shows the final version of the compressor and decompressor that are are used in all four versions of the implementation. There are two final changes made to the routines shown previously. First, both C++ functions are now function templates, parameterized on the the types being used for input and output. Second, the actual input and output is done through four newly introduced template classes:</p>
<pre>
template&lt;class INPUT, class OUTPUT&gt;
void compress( INPUT &amp;input, OUTPUT &amp;output, const unsigned int max_code = 32767 )
{
  input_symbol_stream&lt;INPUT&gt; in( input );
  output_code_stream&lt;OUTPUT&gt; out( output, max_code );

  std::unordered_map&lt;std::string, unsigned int&gt; codes( (max_code * 11)/10 );
  for ( unsigned int i = 0 ; i &lt; 256 ; i++ )
    codes[std::string(1,i)] = i;
  unsigned int next_code = 257;
  std::string current_string;
  char c;
  while ( in &gt;&gt; c ) {
    current_string = current_string + c;
    if ( codes.find(current_string) == codes.end() ) {
      if ( next_code &lt;= max_code )
        codes[ current_string ] = next_code++;
      current_string.erase(current_string.size()-1);
      out &lt;&lt; codes[current_string];
      current_string = c;
    }
  }
  if ( current_string.size() )
    out &lt;&lt; codes[current_string];
}

template&lt;class INPUT, class OUTPUT&gt;
void decompress( INPUT &amp;input, OUTPUT &amp;output, const unsigned int max_code = 32767  )
{
  input_code_stream&lt;INPUT&gt; in( input, max_code );
  output_symbol_stream&lt;OUTPUT&gt; out( output );

  std::unordered_map&lt;unsigned int,std::string&gt; strings( (max_code * 11) / 10 );
  for ( int unsigned i = 0 ; i &lt; 256 ; i++ )
    strings[i] = std::string(1,i);
  std::string previous_string;
  unsigned int code;
  unsigned int next_code = 257;
  while ( in &gt;&gt; code ) {
    if ( strings.find( code ) == strings.end() )
      strings[ code ] = previous_string + previous_string[0];
    out &lt;&lt; strings[code];
    if ( previous_string.size() &amp;&amp; next_code &lt;= max_code )
      strings[next_code++] = previous_string + strings[code][0];
    previous_string = strings[code];
  }
}
</pre>
<p>What exactly is the effect of implementing this algorithm using a pair of <em>function templates</em>, parameterized on the the types of the input and output objects? What this means is that you can call these compression routines with any type of I/O object you can throw at them. It can work with C++ iostreams, C FILE&nbsp;* objects, raw blocks of memory, whatever you want.</p>
<p>But there's a catch to that flexibility - you have to implement some basic I/O routines for whatever type you are using. Fortunately, this is not too hard.</p>
<p>The actual I/O that is done in the compression routines is defined by four template classes I created. These classes are defined in <code>lzw_streambase.h</code>. These classes don't have implementations, but they do define the methods you need to implement to work with the compressor and decompressor. The four classes are: </p>
<ul>
<li/><code>input_symbol_stream&lt;T&gt;</code>
<li/><code>ouput_symbol_stream&lt;T&gt;</code>
<li/><code>input_code_stream&lt;T&gt;</code>
<li/><code>output_code_stream&lt;T&gt;</code>
</ul>
<p>The first two classes are the symbol input and output classes. These are normally going to be very simple implementations, as they just have to read single characters to and from streams, while checking for errors or ends of streams. I use the same versions of these classes in all four implementations, so the code in <code>lzw-a.h</code> is unchanged in the other three header files.</p>
<p>The <code>input_symbol_stream&lt;T&gt;</code> class has one member function: the extraction operator, which reads a character from the stream and returns a boolean true or false. You'll see later in this section that the implementation of this for types such as <code>std::istream</code> is trivial.</p>
<pre>
template&lt;typename T&gt;
class input_symbol_stream
{
public :
    input_symbol_stream( T &amp; );
    bool operator&gt;&gt;( char &amp;c );
};
</pre>
<p>The <code>output_symbol_stream&lt;T&gt;</code> class uses the insertion operator to write strings instead of individual characters - because that is what is stored in the dictionary. The C++ <code>std::string</code> class makes a perfectly good container for any variety of symbols, including binary data, and unlike the alternative <code>vector&lt;char&gt;</code>, it comes with hash functions and <code>iostream</code> operators.</p>
<pre>
template&lt;typename T&gt;
class output_symbol_stream
{
public :
    output_symbol_stream( T &amp;  );
    void operator&lt;&lt;( const std::string &amp;s );
};
</pre>
<p>The <code>input_code_stream&lt;T&gt;</code> class reads codes, normally unsigned integers, from some type of stream. In my implementations, this class also returns false if it encounters the <code>EOF_CODE</code> in the stream of incoming codes. Removing the responsibility for EOF detection from the decompressor makes the code a bit simpler and more versatile.</p>
<p>The formatting of the integer is entirely up to the implementor, but the most common approach will probably be variable length codes ranging from 9 to 16 or so bits.</p>
<pre>
template&lt;typename T&gt;
class input_code_stream
{
public :
    input_code_stream( T &amp;, unsigned int );
    bool operator&gt;&gt;( unsigned int &amp;i );
};
</pre>
<p>The <code>output_code_stream&lt;T&gt;</code> class writes codes, usually unsigned integers, to some type of stream. Whatever class you implement for this function must agree with the implementation for <code>input_code_stream&lt;T&gt;</code>.</p>
<pre>
template&lt;typename T&gt;
class output_code_stream
{
public :
    output_code_stream( T &amp;, unsigned int );
    void operator&lt;&lt;( const unsigned int i );
};
</pre>
<p>You can see that at the top of the compressor and decompressor, I instantiate objects of these types, then use the standard insertion and extraction operators to read and write from these objects. </p>
<h4>LZW-A</h4>
<p>In my sample windows program, I include <code>lzw_streambase.h</code> and <code>lzw.h</code>, which accounts for all of the code you have seen so far. I have the following lines that perform compression and decompression:</p>
<pre>
std::ifstream in( name, std::ios_base::binary );
std::ofstream lzw_out( temp_name_lzw, std::ios_base::binary );
compress( (std::istream &amp;) in, (std::ostream&amp;) lzw_out, pDlg-&gt;m_MaxCodeSize );
.
.
.
std::ifstream lzw_in( temp_name_lzw, std::ios_base::binary );
std::fstream out( temp_name_out,
                  std::fstream::in    |
                  std::fstream::out   |
                  std::fstream::binary );
decompress( (std::istream &amp;) lzw_in, (std::ostream&amp;) out, pDlg-&gt;m_MaxCodeSize );
</pre>
<p>If I try to build this project as-is, I get a nasty list of eight linker errors:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure01.png"/></td>
</tr>
<tr>
<td><center>Visual Studio 10 Error Messages</center></td>
</tr>
</table>
<p></center><br />
If you have the fortitude to crawl through those link errors, you will see that what is missing are the implementations of the four classes parameterized on <code>std::ostream</code> and <code>std::istream</code>. Each of the four classes needs the implementation of a constructor and either an insertion or extraction operator. And with no class definitions at all, that adds up to eight missing functions. To get us started on performing actual LZW compression, I've created the first implementation of these four classes in <code>lzw-a.h</code>. Let's take a look at each of these in turn.</p>
<p>It's tempting to try to read characters using the <code>ifstream</code> extraction operator, as in <code>m_impl &gt;&gt; c</code>, but that operator skips over whitespace, so we don't get an exact copy of the input stream. Using <code>get()</code> works around this problem. Below is the complete definition of <code>input_symbol_stream&lt;std::istream&gt;</code> used in all four LZW implementations in this article:</p>
<pre>
template&lt;&gt;
class input_symbol_stream&lt;std::istream&gt; {
public :
    input_symbol_stream( std::istream &amp;input )
        : m_input( input ) {}
    bool operator&gt;&gt;( char &amp;c )
    {
        if ( !m_input.get( c ) )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>Using the insertion operator to output strings seems to work properly, even when the strings contain binary data, so the implementation of the class used to output symbols is as simple as we could hope for. Again, this exact code is used in all four implementations in this article:</p>
<pre>
template&lt;&gt;
class output_symbol_stream&lt;std::ostream&gt; {
public :
    output_symbol_stream( std::ostream &amp;output )
        : m_output( output ) {}
    void operator&lt;&lt;( const std::string &amp;s )
    {
        m_output &lt;&lt; s;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>LZW-A prints the text values of integers to the output stream, and reads them back in that format. This is not efficient at all, but it is a great aid in debugging. If you are having a problem with the algorithm, this provides a nice way to examine your stream. The implementation of this is very simple - just use the <code>std::ostream</code> insertion operator, and follow each code by a newline so it can be properly parsed on input, as well as be easily loaded into a text editor.</p>
<p>One important thing to notice in this class: the presence of a destructor that prints the <code>EOF_CODE</code>. Since this object goes out of scope as the compressor exits, this insures that every code stream will end with this special code. Putting the onus on the I/O routines to deal with EOF issues simplifies the algorithm itself. (It also means that you can implement versions of LZW that don't use an EOF in the code stream.)</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt; {
public :
    output_code_stream( std::ostream &amp;output, const unsigned int )
        : m_output( output ) {}
    void operator&lt;&lt;( unsigned int i )
    {
        m_output &lt;&lt; i &lt;&lt; '\n';
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>The corresponding version of the input class just reads in the white-space separated codes. If there is an error or an <code>EOF_CODE</code> encountered in the stream, the extraction operator returns false, which allows the decompressor to know when it is time to stop processing.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt; {
public :
    input_code_stream( std::istream &amp;input, unsigned int )
        : m_input( input ) {}
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        m_input &gt;&gt; i;
        if ( !m_input || i == EOF_CODE )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>By including <code>lzw-a.h</code> along with the other two header files, I can now create a program that compiles, links, and is able to test the algorithm. Using my UNIX test program, I compress the demo string from earlier in this article, and I see the output as it is sent directly to <code>stdout</code>:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure02.png"/></td>
</tr>
<tr>
<td><center>Compressing <code>ABBABBBABBA</code></center></td>
</tr>
</table>
<p></center><br />
Fortunately, the output is identical to what was shown earlier, with the addition of the final <code>EOF_CODE</code> used to delimit the end of the code stream.</p>
<h4>LZW-B</h4>
<p>The header file <code>lzw-b.h</code> implements specialized classes that replace the text-mode output of the codes in <code>lzw-a.h</code> with binary codes stored in a short integer - two bytes. </p>
<p>The classes that read and write symbols are unchanged, but reading and writing codes has to change in order to do this new binary output.</p>
<p>Writing the codes to <code>std::ostream</code> as binary values requires breaking the integer code into two bytes and writing the bytes one at a time. There are more efficient ways to write the complete short integer in one function call, but they raise code portability problems, as we don't always know what order bytes will be written in.</p>
<p>Like the code stream output object in <code>lzw-a.h</code>, this version of the code output class has a destructor that outputs an <code>EOF_CODE</code> value:</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt; {
public :
    output_code_stream( std::ostream &amp;output, const unsigned int )
        : m_output( output ) {}
    void operator&lt;&lt;( unsigned int i )
    {
        m_output.put( i &amp; 0xff );
        m_output.put( (i&gt;&gt;8) &amp; 0xff);
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
    }
private :
    std::ostream &amp;m_output;
};
</pre>
<p>Reading the codes requires reading the two bytes that make up the short integer, then combining them. While reading, if the routine detects an <code>EOF_CODE</code>, it returns false, which tells the decompressor to stop processing. It also returns false if there is an error on the input code stream.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt; {
public :
    input_code_stream( std::istream &amp;input, unsigned int )
        : m_input( input ) {}
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        char c;
        if ( !m_input.get(c) )
            return false;
        i = c &amp; 0xff;
        if ( !m_input.get(c) )
            return false;
        i |= (c &amp; 0xff) &lt;&lt; 8;
        if ( i == EOF_CODE )
            return false;
        else
            return true;
    }
private :
    std::istream &amp;m_input;
};
</pre>
<p>The most exciting thing about <code>lzw-b.h</code> is that you can now see data compression taking place. The figure below shows a sample run of this implementation against the <a href="http://corpus.canterbury.ac.nz/descriptions/" class="newpage">Canterbury Corpus</a>, a standard set of files used to test compression. A run with my Windows test program shows that  the files are compressing quite nicely:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure03.png"/></td>
</tr>
<tr>
<td><center>Compressing the Canterbury Corpus with <code>lzw-b.h</code></center></td>
</tr>
</table>
<p></center></p>
<h4>LZW-C</h4>
<p>The third I/O implentation, defined in <code>lzw-c.h</code>, writes binary codes like <code>lzw-b.h</code>, but with one crucial difference. Instead of being hard coded to 16 bit codes, <code>lzw-c.h</code> determines the maximum code size needed based on the maximum code value passed as an argument to <code>compress()</code> and <code>decompress()</code>. It then writes codes based on that width, which will normally be something in the range of 9-18 bits wide.</p>
<p>Since these values are not aligned with byte boundaries, there are some issues writing them to streams that expect to read and write bytes. However, it is definitely worth all the bit shifting, ORing, and ANDing, because when the size is 12 bites, we are going to save four bits per code when compared to using <code>lzw-b.h</code>. But every read and write potentially starts somewhere in the middle of a byte, so the I/O classes have to do some extra work - mostly involved with shifting bits to the correct position in the output stream.</p>
<p>Note that the code to read and write symbols is unchanged from <code>lzw-a.h</code> and <code>lzw-b.h</code>.</p>
<p>Many of the CS students who read my earlier article on LZW ran into a brick wall when they started trying to understand the code that performs I/O on codes of variable bit lengths. Obviously, writing 11 bit codes when your file system is oriented around eight-bit bytes involves a lot of bit twiddling, and I'm afraid that many novices are woefully deficient in this department. Not just in understanding the bitwise operators in C, such as shifting, masking, etc., but in understanding binary arithmetic in general.</p>
<p>That's why I've structured the code and this article a bit differently this time around. If the I/O operations in <code>lzw-c.h</code> and <code>lzw-d.h</code> are bewildering, well, no worries. They have absolutely nothing to do with the LZW algorithm itself. You can investigate and explore the algorithm completely using <code>lzw-a.h</code> and <code>lzw-b.h</code>, and just forget about the last two I/O implementations. They provide additional efficiency, but as I have said, have nothing to do with the algorithm itself. </p>
<p>Further, once you use <code>lzw-a.h</code> to debug and understand the algorithm, you can certainly plug in <code>lzw-c.h</code> and <code>lzw-d.h</code> and take advantage of their improved compression, even if you don't follow all the code. </p>
<p>It might be appropriate to add a sidebar or another section to explain the variable bit length I/O in detail, but this article is quite long already, and there are numerous other resources for the interested reader to explore the details. (But if you find yourself deficient in this area, you owe it to yourself to hit the books and get to the point where these operations make sense. This won't be the last time you need to understand bitwise operators.)</p>
<p>For those who are ready to tackle this more complicated I/O procedure, we will look first at the <code>output_code_stream&lt;std::ostream&gt;</code> class. Here, the first thing to understand is that the constructor has to initialize the number of bits in the code. This value is calculated from the <code>max_code</code> parameter, and is stored in member <code>m_code_size</code>, where it is used frequently.</p>
<p>Next, the insertion operator. Output of codes proceeds as follows. Member <code>m_pending_bits</code> tells us how many bits are pending output while sitting in member <code>m_pending_output</code>. These bits are right justified, and the count will always be less than eight. When the new code is written, it is inserted into <code>m_pending_output</code> after being left shifted so it will be laid down just past the pending bits. After doing that, we presumably have some bytes to output - the exact number depends on various factors. The <code>flush()</code> routine is called, and it flushes all complete bytes out. When it completes, there can be anywhere from zero to seven bits still waiting to be output, and they will be right justified in <code>m_pending_output</code>.</p>
<p>In the destructor, we output an <code>EOF_CODE</code>, and then do a flush as well. But in this case, we flush all possible bits, not just the complete bytes. There are two good reasons for this. First,  we don't care if the last bits that are flushed out are only part of a code - the code will be <code>EOF_CODE</code>, and that is the last one. And second, if we don't flush those final bits out in the destructor, they will never be sent to the output stream. This means the decoder will not see those bits, and we will most likely break the decompress process.</p>
<pre>
template&lt;&gt;
class output_code_stream&lt;std::ostream&gt;
{
public :
    output_code_stream( std::ostream &amp;out, unsigned int max_code )
        : m_output( out ),
          m_pending_bits(0),
          m_pending_output(0),
          m_code_size(1)
    {
        while ( max_code &gt;&gt;= 1 )
            m_code_size++;
    }
    ~output_code_stream()
    {
        *this &lt;&lt; EOF_CODE;
        flush(0);
    }
    void operator&lt;&lt;( const unsigned int &amp;i )
    {
        m_pending_output |= i &lt;&lt; m_pending_bits;
        m_pending_bits += m_code_size;
        flush( 8 );
    }
private :
    void flush( const int val )
    {
        while ( m_pending_bits &gt;= val ) {
            m_output.put( m_pending_output &amp; 0xff );
            m_pending_output &gt;&gt;= 8;
            m_pending_bits -= 8;
        }
    }
    std::ostream &amp;m_output;
    int m_code_size;
    int m_pending_bits;
    unsigned int m_pending_output;
};
</pre>
<p>Like the output code class, the input code class has to calculate the code size for this decompression based on the <code>max_code</code> value passed in the function call. </p>
<p>When an attempt is made to read a code, there must be a  minimum of <code>m_code_size</code> bits in member <code>m_pending_input</code>. If there aren't, new bytes are read in one at a time, and inserted into <code>m_pending_input</code> after having been shifted left the appropriate amount. Once <code>m_pending_input</code> contains at least <code>m_code_size</code> bits, the code is extracted from <code>m_pending_input</code> using the appropriate mask, the count in <code>m_pending_input</code> is reduced, and <code>m_pending_input</code> is shifted right by <code>m_code_size</code> bits.</p>
<pre>
template&lt;&gt;
class input_code_stream&lt;std::istream&gt;
{
public :
    input_code_stream( std::istream &amp;in, unsigned int max_code )
        : m_input( in ),
          m_available_bits(0),
          m_pending_input(0),
          m_code_size(1)
    {
        while ( max_code &gt;&gt;= 1 )
            m_code_size++;
    }
    bool operator&gt;&gt;( unsigned int &amp;i )
    {
        while ( m_available_bits &lt; m_code_size )
        {
            char c;
            if ( !m_input.get(c) )
                return false;
            m_pending_input |= (c &amp; 0xff) &lt;&lt; m_available_bits;
            m_available_bits += 8;
        }
        i = m_pending_input &amp; ~(~0 &lt;&lt; m_code_size);
        m_pending_input &gt;&gt;= m_code_size;
        m_available_bits -= m_code_size;
        if ( i == EOF_CODE )
            return false;
        else
            return true;
}
private :
    std::istream &amp;m_input;
    int m_code_size;
    int m_available_bits;
    unsigned int m_pending_input;
};
</pre>
<p>The table below shows the results of a test run comparing LZW-B and LZW-C run with a maximum code of 4095. With this maximum value, all codes fit in a 12-bit integer. Since LZW-B will use a 16-bit integer to store the code values, and LZW-C will use 12-bits, there should be a 4:3 ratio between the ratio of the file sizes when compressed using the two algorithms, and this looks to be the case:<br />
<center></p>
<table border=1">
<tr>
<th>File Name</th>
<th>Original<br/>Size</th>
<th>Compressed<br/>LZW-B</th>
<th>Compressed<br/>LZW-C</th>
<th>Ratio</th>
</tr>
<tr>
<td>alice29.txt</td>
<td>152089</td>
<td>96428</td>
<td>72322</td>
<td>0.750</td>
</tr>
<tr>
<td>alphabet.txt</td>
<td>100000</td>
<td>4538</td>
<td>3404</td>
<td>0.750</td>
</tr>
<tr>
<td>asyoulik.txt</td>
<td>125179</td>
<td>83966</td>
<td>62975</td>
<td>0.750</td>
</tr>
<tr>
<td>bib</td>
<td>111261</td>
<td>71792</td>
<td>53845</td>
<td>0.750</td>
</tr>
<tr>
<td>bible.txt</td>
<td>4047392</td>
<td>2468326</td>
<td>1851245</td>
<td>0.750</td>
</tr>
</table>
<p>Comparing 12-bit compression between LZW-B and LZW-C<br />
</center><br />
It looks like things are working as expected.</p>
<h4>LZW-D</h4>
<p>The code in <code>lzw-d.h</code> represents the final and most efficient version of I/O for the LZW code streams. It builds on the code in <code>lzw-c.h</code> - at its core it is a variable bit-length I/O stream. However, there is one crucial difference from <code>lzw-c.h</code>: the code I/O in <code>lzw-d.h</code> starts at the smallest possible code size, nine bits, and increases the code size as needed, until it reaches the maximum value for this compression session. The maximum value is the parameter passed in to the invocation of <code>compress()</code> or <code>decompress()</code>.</p>
<p>The logic behind this is pretty simple. Even if we are going to use 16-bit codes in an LZW program, when the program first starts, the maximum possible code the program can emit is 256, which only needs nine bits to encode. And each time we output a new symbol, that maximum possible code value only increases by one, which means that the first 256 codes output by the encoder can all fit in nine bits.</p>
<p>So the LZW-D encoder starts encoding using nine-bit code widths, and then bumps the value to ten as soon as the highest possible output code reaches 512. This process continues, incrementing the code size until the maximum code size is reached. At that point the code size stays fixed, as no new codes are being added to the dictionary.</p>
<p>The decoder follows exactly the same process - reading in the first code with a width of nine bits, then bumping to ten when the maximum possible input code reaches 512.</p>
<p>The code for this class is built on that from <code>lzw-c.h</code>, with some added complexity. Due to its increasing length, and the fact that it doesn't add too much to the discussion of LZW, I've omitted the listing, and instead refer you to the download available at the end of the article.</p>
<h4>The Windows Test Program</h4>
<p>When you develop compression code, there are a few different common tasks you are likely to want to perform:</p>
<ul>
<li/>Check your code for correctness, often through bulk testing.
<li/>Check your compression ratios against standard benchmarks.
<li/>Analyze your program's performance so as to make it more efficient and locate bottlenecks.
</ul>
<p>My Windows app is designed to help with all of these tasks. It basically allows you to select a single directory, set a maximum code size, then perform compression and decompression of all the files in the directory. An optional checkbox lets you include files in all directories under the test directory as well.</p>
<p>The application was built using Visual Studio 10, and it is a simple MFC Dialog-based application. It allows you to select a base directory, a maximum code size, and then compress all the files in that directory. If you select the recursion check box, you will also compress all the files in the entire tree of subdirectories below it.</p>
<p>Each file is compressed to a temporary location, then decompressed in a temporary location. The size of the compressed file is saved, and then a comparison is done to ensure that the original and expanded files are identical.</p>
<p>To help with data collection, after running a test, you can press the copy button and get the results of the test stuffed into your clipboard. Although it isn't visible in the display, the data stored in your clipboard includes the full path name of the original file, not just the basename.</p>
<p>This Visual Studio project takes advantage of a number of C++11 features, and as a result it will need some modification to work with earlier versions. Any version that supports <code>unordered_map</code> can be made to build without too many changes. And if you are going way back in time, you could replace <code>unordered_map</code> with <code>map</code>.</p>
<p>As shipped, the test program uses <code>lzw-d.h</code>. To use any of the other three other versions of I/O discussed in this article, just modify the include file selected at the top of LzwTestDlg.cpp. The figure below shows what the app looks like after running through some data:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure04.png"/></td>
</tr>
<tr>
<td><center>The Windows test app after a test run</center></td>
</tr>
</table>
<p></center><br />
After pressing the copy button at the bottom of the dialog, you can paste the data into a spreadsheet and then crunch it to your heart's content:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/lzw/Figure05.png"/></td>
</tr>
<tr>
<td><center>Copying the data into a spreadsheet</center></td>
</tr>
</table>
<p></center></p>
<h4>The Linux Test Program</h4>
<p>The LZW code is platform independent, and will build and run just fine on UNIX or Linux systems. The Linux test program, <code>lzw.cpp</code>, allows you to compress or decompress files from the command line. It builds just fine with g++ 4.5, as long as you use the <code>-std=c++0x</code> switch to turn on the latest language features. Compiling with earlier versions will require a few minor modifications.</p>
<p>The command line interface to the test program is not too complicated, and is probably best documented by looking at the usage output:</p>
<pre>
mrn@ubuntu:~/LzwTest$ g++ -std=c++0x lzw.cpp -o lzw
mrn@ubuntu:~/LzwTest$ ./lzw
Usage:
lzw [-max max_code] -c input output #compress file input to file output
lzw [-max max_code] -c - output     #compress stdin to file otuput
lzw [-max max_code] -c input        #compress file input to stdout
lzw [-max max_code] -c              #compress stdin to stdout
lzw [-max max_code] -d input output #decompress file input to file output
lzw [-max max_code] -d - output     #decompress stdin to file otuput
lzw [-max max_code] -d input        #decompress file input to stdout
lzw [-max max_code] -d              #decompress stdin to stdout
mrn@ubuntu:~/LzwTest$
</pre>
<p>Like the Windows test program, the command line program is built by default with <code>lzw-d.h</code>. Replacing this algorithm with any of the three others requires a minor change to the source code.</p>
<p>With the default build, the program produces output nearly identical to UNIX compress. The one difference is that UNIX compress monitors the compression ratio after the dictionary is full, and clears the dictionary if the ratio starts to deteriorate (which it almost always does.) I include a benchmark program that tests UNIX compress against the command line test program, and the results show that for small files, the file size is almost identical:</p>
<pre>
mrn@ubuntu:~/LzwTest$ ./benchmark.sh 65535 16 canterbury | head -n 15 | column -t
Filename                 Original-size  LZW-size  Compress-size
--------                 -------------  --------  -------------
canterbury/aaa.txt       33406          320       321
canterbury/alice29.txt   152089         62247     62247
canterbury/alphabet.txt  100000         3052      3053
canterbury/asyoulik.txt  125179         54989     54990
canterbury/a.txt         1              3         5
canterbury/bib           111261         46527     46528
canterbury/bible.txt     4047392        1417735   1377093
canterbury/book1         768771         317133    317133
canterbury/book2         610856         247593    251289
canterbury/cp.html       24603          11315     11317
canterbury/E.coli        4638690        1213579   1218349
canterbury/fields.c      11150          4963      4964
canterbury/geo           102400         77777     77777
</pre>
<p>You can see in this test that LZW-D and UNIX compress perform nearly identically for all but the largest files in the test sample. If I modify UNIX compress to not monitor compression ratios, the difference seen with larger files goes away:</p>
<pre>
mrn@ubuntu:~/LzwTest$ ./benchmark.sh 65535 16 canterbury | head -n 15 | column -t
Filename                 Original-size  LZW-size  Compress-size
--------                 -------------  --------  -------------
canterbury/aaa.txt       33406          320       321
canterbury/alice29.txt   152089         62247     62247
canterbury/alphabet.txt  100000         3052      3053
canterbury/asyoulik.txt  125179         54989     54990
canterbury/a.txt         1              3         5
canterbury/bib           111261         46527     46528
canterbury/bible.txt     4047392        1417735   1417735
canterbury/book1         768771         317133    317133
canterbury/book2         610856         247593    247593
canterbury/cp.html       24603          11315     11317
canterbury/E.coli        4638690        1213579   1213579
canterbury/fields.c      11150          4963      4964
canterbury/geo           102400         77777     77777
</pre>
<p>That provides some support for the notion that the algorithm shown here behaves properly.</p>
<h4>Your Program</h4>
<p>If you want to build your own program and use these classes, all you need is a C++11 compiler, or an earlier version and a willingness to make a few changes. </p>
<p>To use the classes, include in order <code>lzw_streambase.h</code>, one of the four implementation files for <code>iostreams</code>, preferably <code>lzw-d.h</code>, and finally, <code>lzw.h</code>. Because the significant code in these files is all implemented as template functions or classes, there is no library to include in your project, and no C++ source you have to compile separately.</p>
<p>All of the code in these header files has been hoisted into the <code>lzw</code> namespace, so you will either have to explicitly use the namespace when you invoke <code>compress()</code> and <code>decompress()</code>, or insert this line into your program:</p>
<pre>
using namespace lzw;
</pre>
<p>One thing to note about the I/O routines I have defined. The template functions are specialized on <code>std::istream</code> and <code>std::ostream</code>. If you innocently pass in an object such as an <code>std::ifstream</code>, you will get compile time errors. This is because C++ template matching is done on a very strict basis - the compiler won't generally try to figure out that <code>std::ifstream</code> is derived from <code>std::istream</code>, and use the existing class. So instead, you will need to cast your arguments to the types defined in the header files. (Or write your own implementations.)</p>
<p>Your rights to use this code are covered by my <a href="http://marknelson.us/code-use-policy/" class="newpage">Liberal Code Use Policy</a>. As I have mentioned before, this is teaching code, if you decide to use it in a production system, there are many optimizations you might want to perform.</p>
<h4>Benchmarks</h4>
<p>So how does LZW do when it comes to compression? LZW's original strength was its combination of good compression ratios with high speed compression. The UNIX compress program is still nice and  fast, and Terry Welch's original application for LZW was in disk controllers. Because my program is a teaching program, it won't be nearly as fast as compress, but it's still useful to compare it to the de facto standard for lossless compression: the deflate algorithm.</p>
<p>We can compare LZW against deflate by a small modification of my benchmark script that uses gzip instead of compress. The table below shows the average compression ratios for the files in the canterbury corpus when compressed using maximum code widths of 15-18 bits. (The ratio is defined as 100*compressed_size/uncompressed_size, so 0% is perfect compression and 100% is no compression.)<br />
<center></p>
<table border="1">
<tr>
<th>gzip</th>
<th>LZW 15 bits</th>
<th>LZW 16 bits</th>
<th>LZW 17 bits</th>
<th>LZW 18 bits</th>
</tr>
<tr>
<td>32.7%</td>
<td>43.2%</td>
<td>42.6%</td>
<td>42.5%</td>
<td>42.3%</td>
</tr>
</table>
<p></center><br />
You can see that LZW does do a good job of compressing data, but the deflate algorithm used by gzip manages to squeeze an additional 10%, more or less, out of the files it compresses. The gap between LZW and deflate is larger on some types of files, and smaller on others, but deflate will almost always show a noticeable difference in compression ratios.</p>
<h4>Variations</h4>
<p>There are many variations on the code I've presented here that make sense. </p>
<p>One obvious change is to eliminate the special <code>EOF_CODE</code> used to delimit the end of the code stream. If the code stream is a file or other stream with an inherent EOF condition, there is no need for an <code>EOF_CODE</code> - simply reaching the end of the input stream will properly signal the end of the decoded material. Freeing up this one code will make a microscopically small improvement in the compression ratios of the product.</p>
<p>If you want to mimic the output of the compress program, you need to remove the <code>EOF_CODE</code>, and replace it with a <code>CLEAR_CODE</code> that has a value of 256. The compress program monitors the compression ratios it achieves after its dictionary is full, and when the ratio starts to decay, it issues the <code>CLEAR_CODE</code>. That code tells the decoder to clear its dictionary and make a fresh start with new nine-bit codes.</p>
<p>Once you get the hang of LZW, a good exercise to make sure you have it working properly is to create a GIF encoder and decoder. GIF uses LZW to losslessly compress images with a constrained palette, and after all these years is still somewhat of a standard on the web.</p>
<h4>History</h4>
<p>Usually the history lesson on an algorithm is at the start of the article, but this is a how-to piece, and I feel like the trip down memory lane is not as important as understanding how the algorithm works.</p>
<p>The roots of LZW were set down in 1978 when Jacob Ziv and Abraham Lempel published the second of their two seminal works on data compression, <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/ziv_lempel_1978_variable-rate.pdf" class="newpage">"Compression of Individual Sequences via Variable-Rate Coding"</a>. This paper described a general approach to data compression that involved building dictionaries of previously seen strings.</p>
<p>Ziv and Lempel's work was targeted at an academic audience, and it wasn't truly popularized until 1984 when Terry Welch published <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/welch_1984_technique_for.pdf" class="newpage">A Technique for High-Performance Data Compression</a>. Welch's paper took the somewhat abstract Information Theory work of Ziv and Lempel and reduced it to practice in such a way that others could easily implement it.</p>
<p>UNIX compress was probably the first popular program that used LZW compression, and it very quickly became a standard utility on UNIX systems. The freely available code for compress was incorporated into <a href="http://en.wikipedia.org/wiki/ARC_(file_format)" class="newpage">ARC</a>, one of the first archiving programs for PCs. In addition, the algorithm was used in the GIF file format, originally created by Compuserve in 1987.</p>
<p>LZW's popularity waned in the 1990s for two important reasons. First, Unisys began enforcing their patents that covered LZW compression, demanding and receiving royalties from various software companies. Not only did this make developers think twice about the liability they could incur while using LZW, it resulted in a general public relations backlash against using patented technology.</p>
<p>Secondly, the LZW algorithm was eclipsed on the desktop by deflate, as popularized by PKZIP. Not only did deflate outperform LZW, it was unencumbered by patents, and eventually had a very reliable and free open source implementation in <a href="http://zlib.net/" class="newpage">zlib</a>, a library written by a team lead by Marc Adler and Jean-loup Gailly. I don't know if there is any way to actually quantify this, but I think one could speculate that zlib is currently installed on more computer systems than any other software package in existence.</p>
<p>So LZW has settled down to an existence out of the limelight. It is still an important algorithm, used in quite a few file formats, and as this article shows, its simplicity makes it an excellent learning tool. </p>
<h4>Downloads</h4>
<ul>
<li><a href="/attachments/2011/lzw/LzwTest.zip">LzwTest.zip</a> - source for the Windows test app.
<li><a href="/attachments/2011/lzw/LzwExe.zip">LzwExe.zip</a> - The Windows test app executable.
<li><a href="/attachments/2011/lzw/lzw.tgz">lzw.tgz</a> - source for the UNIX text app.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/11/08/lzw-revisited/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>DNS Service Discovery On Windows</title>
		<link>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/</link>
		<comments>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 11:54:04 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[C/C++]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=978</guid>
		<description><![CDATA[In a previous post I showed you how we use DNS Service Discovery in a product I work on for Cisco Systems. That project uses the Avahi browser, which does not have a Windows port. In this article, I'll show you how to perform service discovery on Windows using Apple's Bonjour SDK for Windows. DNS-SD [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://marknelson.us/2011/09/30/dns-service-discovery/" class="newpage">previous post</a> I showed you how we use DNS Service Discovery in a product I work on for Cisco Systems. That project uses the <a href="http://avahi.org/" class="newpage">Avahi browser</a>, which does not have a Windows port. In this article, I'll show you how to perform service discovery on Windows using Apple's <a href="http://developer.apple.com/opensource/" class="newpage">Bonjour SDK for Windows</a>.<br />
<span id="more-978"></span></p>
<h4>DNS-SD On Windows</h4>
<p>Microsoft has been pushing us to use <a href="http://en.wikipedia.org/wiki/Universal_Plug_and_Play" class="newpage">UPnP</a> as our network discovery protocol, to the exclusion of all others. As a result, Windows ships with no support for DNS-SD - zip. And that might be the end of it, if it weren't for Apple's vested interest in having iTunes installed on every Windows machine in the world. </p>
<p>iTunes uses DNS-SD to share music catalogs across Local Area Networks - a natural choice with the native support in OS X for Zeroconf. Rather than isolate Windows users in an pocket UPnP universe, Apple chose to instead port the service discovery components of Bonjour to Windows, and install it with every copy of Windows. Thus, Windows and OS X users can happily share their iTunes libraries with no unusual calisthenics required.</p>
<p>To sweeten the deal, Apple has released a Bonjour SDK for Windows, which is currently shipping version 3.0 from their <a href="http://developer.apple.com/opensource/" class="newpage">developer support site</a>. In Apple's words:</p>
<blockquote><p>
The Bonjour SDK for Windows includes the latest version of Bonjour as well as header files, libraries, and sample code to assist developers in creating Bonjour enabled applications on Windows. The SDK has been updated with the Bonjour core that is bundled with iTunes 10.3.1. This release will bring all existing Bonjour functionality released in Mac OS X 10.7 into the Bonjour for Windows product.
</p></blockquote>
<p>It sounds pretty good, doesn't it?</p>
<h4>An Emphasis on Kit</h4>
<p>Installing the SDK is a decision-free breeze:<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure01.png"/></center></td>
</tr>
<tr>
<td><center>Installing the SDK</center></td>
</tr>
</table>
<p></center><br />
The installation may be easy, but the real dirty truth about the Bonjour SDK is that it is not much of an SDK at all. The developer interface to Bonjour services are packaged in a single DLL, and the SDK provides programmers with sample code that illustrates how to use some of the bindings in C, C#, Java, and VB. There is no documentation on the interfaces, no source for the Bonjour components, and the few samples don't begin to provide comprehensive coverage of the interface.</p>
<p>In other words, this is pretty much a code dump.</p>
<h4>An Overview</h4>
<p>For the most part, the Bonjour SDK interfaces follow a single pattern. Each request made to the API returns immediately, and gives you a reference that you can use to track the progress of your request. That reference can be converted to a file handle, and you can use the file handle to see when your request has some data to produce.</p>
<p>When your request has generated some data, and the Bonjour components are ready to deliver it to you, they do so by a callback mechanism - the Bonjour components make calls into your C or C++ program and provide the data you requested.</p>
<p>Most of the callbacks include a flag parameter. You can check the flag to see if there is any more data expected. If there isn't, you can delete the reference and you are then done with that particular call. Otherwise, you will have to wait for the request to be responded to when the Bonjour components get around to it.</p>
<h4>An Example - Discover Service Types</h4>
<p>I've written a demo program that browses the network for details on every service instance it can find, and presents the results in a tree format. The figure below shows the program running on my home network, where I have drilled down to get information about an instance of a print service.<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure02.png"/></center></td>
</tr>
<tr>
<td><center>The ServiceBrowser Sample Program</center></td>
</tr>
</table>
<p></center><br />
This program has to start by doing a top-level iteration of all the service types currently seen on my network. From my previous article, you might remember that I can use a special browse command to accomplish this. If you've installed the Bonjour SDK, you can run the dns-sd program and browse for service type <code>_services._dns-sd._udp</code>, which should give results something like this:</p>
<pre>
C:\Users\Mark>dns-sd -B _services._dns-sd._udp
Browsing for _services._dns-sd._udp
Timestamp     A/R Flags if Domain  Service Type  Instance Name
17:38:09.407  Add     3 13 .       _tcp.local.     _smb
17:38:09.407  Add     3 13 .       _tcp.local.     _printer
17:38:09.408  Add     3 13 .       _tcp.local.     _pdl-datastream
17:38:09.408  Add     3 13 .       _tcp.local.     _http
17:38:09.408  Add     3 13 .       _tcp.local.     _tivo-videos
17:38:09.409  Add     2 13 .       _tcp.local.     _csco-sb
</pre>
<p>Looking to do the same thing in my demo program, I turn to the header file <code>dns_sd.h</code>. This file ships with the Bonjour SDK and contains not only the API definition, but what passes for documentation. In that header file I see that there is a function called <code>DNSServiceBrowse</code>, and it looks like it does exactly what I want. In my sample program, my call to this routine is shown below:</p>
<pre>
DNSServiceRef client = NULL;
DNSServiceErrorType err;
err = DNSServiceBrowse( &amp;client,
                        0,
                        0,
                        &quot;_services._dns-sd._udp&quot;,
                        &quot;&quot;,
                        IterateServiceTypes,
                        this );
</pre>
<p>You can get some exposition on each of the arguments I pass to the function from the header file, my brief comments on each are given below:</p>
<table cellspacing="10">
<tr>
<td valign="top">sdRef</td>
<td>Every call to the Bonjour API creates a new <code>DNSServiceRef</code> handle. It is initialized by the function call, and used later to retrieve the results.</td>
</tr>
<tr>
<td valign="top">flags</td>
<td>This parameter is not used in this version of the SDK.</td>
</tr>
<tr>
<td valign="top">interfaceIndex</td>
<td>This argument is used to select a specific network interface. For this particular function I want to browse on all available interfaces, so a value of 0 is used.</td>
</tr>
<tr>
<td valign="top">regType</td>
<td>The type of service being browsed for. Normally when you call <code>DNSServiceBrowse</code> you will use this parameter to specify a specific service type that you are interested in, such as <code>_http._tcp</code>. I'm using the special type of <code>_services._dns-sd._udp</code> in order to get a list of all published service types, not specific instances.</td>
</tr>
<tr>
<td valign="top">domain</td>
<td>By passing an empty string I tell the service that I want to see advertisements from all domains.</tr>
</td>
<tr>
<td valign="top">callback</td>
<td>The pointer to a callback function that will receive the responses to this request. The argument I pass in, <code>IterateServiceTypes</code>, is a static member of my MCF Dialog class.</td>
</tr>
<tr>
<td valign="top">context</td>
<td>The context variable is an opaque pointer type that is passed in to the Bonjour service. When it performs a callback, it will include a copy of the context pointer for the use of the callback function. I always pass in a pointer to my MFC Dialog class, so my callback functions have full access to the class members.</td>
</tr>
</table>
<p>The important thing to note here is that the call to the API function doesn't return any important data. All I get back is an error code indicating that the request is being processed, and a <code>DNSServiceRef</code> handle that I use to track that progress.</p>
<p>The real action comes when my callback function is invoked. <code>IterateServiceTypes</code> is a static member of my Dialog class. Apple kept things simple by having one callback type for C and C++, which means no member functions. You could easily build shims to make it appear as though the DLL was calling member functions - it would just take a small modification to the code I'll show you here.</p>
<p>The function definition has to follow exactly the declaration given in <code>dns_sd.h</code>. My implementation starts like this:</p>
<pre>
void DNSSD_API CServiceBrowserDlg::IterateServiceTypes( DNSServiceRef sdRef,
                                                        DNSServiceFlags flags,
                                                        uint32_t interfaceIndex,
                                                        DNSServiceErrorType errorCode,
                                                        const char *serviceName,
                                                        const char *regtype,
                                                        const char *domain,
                                                        void *context )
{
	CServiceBrowserDlg *p = (CServiceBrowserDlg *) context;
</pre>
<p>It's worth walking through a look at each of the parameters in the callback:</p>
<table border="0" cellspacing="10">
<tr>
<td valign="top">sdRef</td>
<td>This is the same <code>DNSServiceRef</code> value that was created when the call was made to <code>DNSServiceBrowse</code>. You don't need to make any use of it in the callback, but it does provide a good way to correlate results with function calls, particularly if you are using a single callback function to process many different results.</td>
</tr>
<tr>
<td valign="top">flags</td>
<td>There are two important flag bits to check in this value. The first value, <code>kDNSServiceFlagsMoreComing</code> is used to indicate that there are definitely more callbacks coming. If that bit is cleared, there is no more pending data. The second flag bit, <code>kDNSServiceFlagsAdd</code> is used to indicate whether this service is being added or deleted. When you first start browsing, all the callbacks will be for services added. As services are added and removed from the system, additional callbacks will be generated with this bit both set and cleared.</td>
</tr>
<tr>
<td valign="top">interfaceIndex</td>
<td>In the callback, this index will be set to the index of the network interface where the advertisement was found. When it comes time to resolve this service, you need to pass in the correct index.</td>
</tr>
<tr>
<td valign="top">errorCode</td>
<td>If this value is not zero, the callback is indicating an error. As long as it is zero your code can process the input safely.</td>
</tr>
<tr>
<td valign="top">serviceName</td>
<td>This value contains the name of a discovered service - it is the whole point of the callback. Normally this will contain the name of an instance of a service. However, when browsing for the special name <code>_services._dns-sd._udp</code>, the instance name is actually a service type.</td>
</tr>
<tr>
<td valign="top">regtype</td>
<td>The type of the service - you may already know this information by the time you reach the callback, but if the callback is handling the results from multiple queries, it can be helfpul.</td>
</tr>
<tr>
<td valign="top">domain</td>
<td>The domain of the discovered service. Like the interface index, you need use the domain when you are attempting to resolve the service</td>
</tr>
<tr>
<td valign="top">context</td>
<td>A copy of the context variable passed in when the browse call was made.</td>
</tr>
</table>
<p>If you look at the first line of code above, the first thing I do is cast the context pointer to its correct type, a pointer to my MFC Dialog class. Now I can make full use of all the members of the class, albeit via a call through a pointer instead of directly.</p>
<p>So what do I do with these services once I receive them? Well, for each service type that I find, I kick off a new browse process, looking for specific instances of the service. Just as an example, in my callback <code>IterateServiceTypes</code>, one of the callbacks returns a service type of <code>_printer._tcp</code>. In order to find all instances of this service, I have to call <code>DNSServiceBrowse</code> again, with that service name and the correct interface and domain. After inserting the service type into the tree, I make that call so I can start adding those instances:</p>
<pre>
HTREEITEM item = p-&gt;m_Tree.InsertItem( CA2T(service_type.c_str()), TVI_ROOT, TVI_SORT );
DNSServiceRef client = NULL;
DNSServiceErrorType err;
err = DNSServiceBrowse( &amp;client,
                        0,
                        0,
                        service_type.c_str(),
                        &quot;&quot;,
                        IterateServiceInstances,
                        context );
</pre>
<p>The key point to note about this call is that the callback function, <code>IterateServiceInstances</code>, is a different member function - one that expects to get the results of my browsing for instances of a specific service.</p>
<h4>Driving the Callbacks</h4>
<p>One thing I've skipped over so far - how do these callbacks actually get generated? Does the DLL asynchronously make calls into my code whenever events occur?</p>
<p>The Bonjour SDK lets your program control when callbacks occur by giving you the handle to the message pump. When you call <code>DNSServiceProcessResult()</code> with a single argument of a <code>DNSServiceRef</code>, you will generate a single callback message for the given reference. The callback will occur within the context of the call to <code>DNSServiceProcessResult()</code>.</p>
<p>When you call <code>DNSServiceProcessResult()</code>, the Bonjour DLL will block if there are no messages ready to process. So how do you know when there are messages ready? </p>
<p>The indicator that messages are ready is given by a file descriptor associated with the <code>DNSServiceRef</code>. You can get a copy of the file descriptor by calling <code>DNSServiceRefSockFD()</code>, passing in a copy of the reference. When the file descriptor has data ready to read, you have callbacks pending. The easiest way to check this condition is to use the <code>select()</code> function, which can check multiple references in one fell swoop.</p>
<p>In my implementation of the callback message pump, I rely on an <code>unordered_map</code> called <code>m_ClientToFdMap</code> that contains a copy of all the <code>DNSServiceRef</code> references currently waiting for responses. I create the necessary data structure used by <code>select()</code>, then call it to get a list of all references that have callbacks pending. The core of this code looks like this:</p>
<pre>
int result = select(0, &amp;readfds, (fd_set*)NULL, (fd_set*)NULL, &amp;tv);
if ( result &gt; 0 ) {
//
// While iterating through the loop, the callback functions might delete
// the client pointed to by the current iterator, so I have to increment
// it BEFORE calling DNSServiceProcessResult
//
    for ( auto ii = m_ClientToFdMap.cbegin() ; ii != m_ClientToFdMap.cend() ; ) {
        auto jj = ii++;
        if (FD_ISSET(jj-&gt;second, &amp;readfds) )
            DNSServiceErrorType err = DNSServiceProcessResult(jj-&gt;first);
    }
}
</pre>
<p>This generates my callbacks efficiently, and because they are in the context of my main program's UI thread, I avoid a lot of unpleasant issues.</p>
<h4>Threading Issues</h4>
<p>My program manages the Bonjour callbacks in a fairly ugly fashion. When my browsing activity starts, I create a timer that fires once every 250 milliseconds. I process up to 10 callbacks in that timer call, then exit. This continues until there are no pending browser or resolution requests, at which time I kill the timer.</p>
<p>Depending on your use of DNS-SD, you may find that this is not as efficient as you like. If this is the case, you might find it useful to move your message pump code to a separate thread.</p>
<p>Once you do that, you can wait on all your callbacks by calling <code>select</code> with a long or infinite timeout. This has the effect of blocking your callback thread until it has actual work to do - resulting in a better use of CPU time.</p>
<p>There are some obvious downsides to this approach. Clearly you have to use some sort of locking mechanism on the data structures that are shared between your callback thread and the rest of your program. And the use of the <code>select()</code> statement with an infinite timeout is complicated by the possibility that you may be making or canceling browsing or resolution calls while your program runs.</p>
<p>A good way to deal with both of these problems is to invoke a socket-based message passing protocol between the callback thread and the other components of your program. If you restrict your interface to messages, you don't have to worry about locking access to shared data. And because you are using a socket for communications, your <code>select()</code> statement will be used to activate the thread when new messages arrive.</p>
<h4>Character Sets</h4>
<p>The days when DNS was limited to seven-bit ASCII characters are long gone. Service instances are encoded as UTF-8, and can use whatever Unicode characters they like. In the figure shown below, you can see the effects of that when I browse for instances of iTunes:<br />
<center></p>
<table border="0">
<tr>
<td><center><img src="/attachments/2011/bonjour-windows/Figure03.png"/></center></td>
</tr>
<tr>
<td><center>Character set problems in service instance names</center></td>
</tr>
</table>
<p></center><br />
You can see that OS X users have so-called curly quotes in their library instance names, and curly quotes are definitely outside the range of seven-bit ASCII. DNS-SD collects the names as UTF-8 encoded strings, and sends them to the console in that format.</p>
<p>By default, the Windows cmd.exe window doesn't render UTF-8 properly, but changing the code page to 65001 results in the correct rendering. </p>
<p>In my sample program, I deal with this with a two step approach. First, my program is built using the Unicode libraries, ensuring that I am able to render Unicode output properly. To conform with Microsoft's C++ paradigms, I use <code>CString</code> for all my Unicode strings, and wrap all my string literals in the <code>_T()</code> wrapper.</p>
<p>This works fine for my UI, but I can't use strings built of <code>wchar_t</code> to communicate with the Bonjour SDK - it expects eight bit characters with UTF-8 encoding. In m program I use the C++ <code>std::string</code> class everywhere where I am working with 8-bit characters that might be encoded in UTF-8. When it comes time to render one of those strings in my Unicode context, all I have to do is use the handy <code>CA2T</code> macro with the <code>CP_UTF8</code> parameter, and things work properly.</p>
<h4>Library Issues</h4>
<p>The design of the Bonjour SDK imposes some uncomfortable restrictions on you when it comes to building your C or C++ program. Because you are linking directly to code found in the library <code>dnssd.lib</code>, you have to ensure that your program and that library link against the same version of the C run time library. And for the Bonjour SDK under Windows, this means you must link with the static, multithreaded, release version of the library.</p>
<p>You'll see the problem in this right away when you create an MFC project and try to build with <code>dnssd.lib</code>. By default, the project generator will probably have you using MFC in a shared DLL, and using the Multithreaded Debug DLL version of the C libraries. When you try to build like this, you will get some unpleasant error messages:</p>
<pre>
1>LINK : warning LNK4098:
         defaultlib 'msvcrtd.lib' conflicts with use of other libs;
         use /NODEFAULTLIB:library
1>LINK : warning LNK4098:
         defaultlib 'LIBCMT' conflicts with use of other libs;
         use /NODEFAULTLIB:library
</pre>
<p>A full featured SDK would provide libraries built for multiple scenarios, and you would pick the one of your choice depending on your build parameters. But with the Bonjour SDK, you don't get this choice, so you need to ensure that your project follows a few guidelines:</p>
<ul>
<li/>Under <i>Configuration Properties/General</i>, field <i>Use of MFC</i> needs to be set to <i>Use MFC in a Static Library</i> for both debug and release builds.
<li/>Under  <i>Configuration Properties/C++/Code Generation</i>, field <i>Runtime Library</i> needs to be set to <i>Multi-threaded (/MT)</i> for both debug and release builds.
<li/>Under  <i>Configuration Properties/C++/Preprocessor</i>, field <i>Preprocessor Definitions</i> the constant <i>_DEBUG</i> needs to be changed to <i>NDEBUG</i> for Debug configurations.
</ul>
<p>To build a project that uses the SDK, you will also need to add <code>dns_sd.lib</code> to your list of linker inputs, add <code>dns_sd.h</code> to your header files, and add the appropriate directories in the configuration under <i>Configuration Properties/C++/General/</i> in field <i>Additional Include  Directories</i>, and under <i>Configuration Properties/Linker/General/</i> in field <i>Additional Include  Directories</i>.</p>
<h4>Overview Of the Demo Program</h4>
<p>I've included the full source for a project that will build with Visual Studio 10, as long as you have the Bonjour SDK installed. It browses all available services on the network and displays the information about them in a tree form. </p>
<p>The program starts by kicking off a browser for <code>_services._dns-sd._udp</code>. The results are processed in member function <code>IterateServiceTypes()</code>. As each new service type is discovered, it is added to the tree, and a call to <code>DNSServiceBrowse()</code>is made to discover all instances of that service type. The callback for that browse call is member function <code>IterateServiceInstances()</code>.</p>
<p>In <code>IterateServiceInstances()</code> I add the instance to the tree, then call <code>DNSServiceResolve()</code>. This function operates much like the browse function, but it actually gets the DNS record for the service. This record contains the host name, service port and a list of name/value pairs that a service can advertise as part of its record. You can see those values put to good work with service types like <code>_ipp._tcp</code>, in which printer parameters are exposed as part of service discovery.</p>
<p><code>ResolveInstance()</code> is the callback routine that receives the information about the service instance. The host name, port, and name/value pairs are added to the tree, and then one final call is made to a Bonjour SDK entry called <code>DNSServiceGetAddrInfo()</code>. This function resolves the IP address for the given host name. The address is stuffed into the tree in callback function <code>GetAddress()</code>.</p>
<h4>Conclusion</h4>
<p>DNS service discover is powerful tool, but Windows programmers might be put off by the lack of a nicely packaged SDK. Using this simple example program might be a good way to get comfortable with an SDK that gives you a powerful tool that provides a good multi-platform alternative to UPnP.</p>
<table border="0" cellspacing="10">
<tr>
<td>Sample program source:</td>
<td><a href="/attachments/2011/bonjour-windows/ServiceBrowser.zip">ServiceBrowser.zip</a></td>
</tr>
<tr>
<td>Sample program executable:</td>
<td><a href="/attachments/2011/bonjour-windows/ServiceBrowserExe.zip">ServiceBrowserExe.zip</a></td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/10/25/dns-service-discovery-on-windows/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNS Service Discovery</title>
		<link>http://marknelson.us/2011/09/30/dns-service-discovery/</link>
		<comments>http://marknelson.us/2011/09/30/dns-service-discovery/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 02:18:41 +0000</pubDate>
		<dc:creator>Mark Nelson</dc:creator>
				<category><![CDATA[Cisco]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://marknelson.us/?p=892</guid>
		<description><![CDATA[For most of this year I've been working on a new product called Cisco OnPlus, a network management service for small business. In order to do its job effectively, OnPlus needs to know what devices are present on the network, and one of the key tools we use to accomplish this is DNS Service Discovery. [...]]]></description>
			<content:encoded><![CDATA[<p>For most of this year I've been working on a new product called <a href="http://www.cisco.com/en/US/products/ps11792/index.html" class="newpage">Cisco OnPlus</a>, a network management service for small business. In order to do its job effectively, OnPlus needs to know what devices are present on the network, and one of the key tools we use to accomplish this is <a href="http://www.dns-sd.org/" class="newpage">DNS Service Discovery</a>. In this article I will show you a little bit about how we use DNS-SD, and show you how you can put it to work effectively in your networks.<br />
<span id="more-892"></span></p>
<h4>OnPlus</h4>
<p>Cisco OnPlus is a cloud-based network management tool that helps resellers support their customers. The figure below shows you a typical view of a customer network from an OnPlus browser screen. (The customer in this case being my Dad.) OnPlus not only identifies the devices on the customer's network, it also performs configuration backups, firmware updates, and monitors network performance.<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/dns-sd/OnPlusView.png"/></td>
</tr>
<tr>
<td><center>The OnPlus Topology View</center></td>
</tr>
</table>
<p></center><br />
In order to get this information about the customer's network, OnPlus relies on the OnPlus Network Agent - an ARM-based Linux PC about the size of a paperback book. This computer is a close relative of the <a href="http://en.wikipedia.org/wiki/SheevaPlug" class="newpage">Sheeva Plug</a>, and despite its small size it runs a fairly complete distribution of Linux.<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/dns-sd/OnPlusAppliance.png"/></td>
</tr>
<tr>
<td><center>The OnPlus Network Agent</center></td>
</tr>
</table>
<p></center><br />
The agent regularly runs a complete inventory of the network, attempting to learn as much as possible about all of the devices it can find. The inventory process uses a huge list of protocols when scanning the network, including:</p>
<ul>
<li/>DNS Service Discovery
<li/>DHCP Packet Inspection
<li/>DNS Packet Inspection
<li/>Windows Management Instrumentation
<li/>Cisco Discovery Protocol, or CDP
<li/>NETBIOS/SMB
<li/>UPnP
<li/>SLP
<li/>Traceroute
<li/>ARP
</ul>
<p>When it comes to locating devices made by my business unit at Cisco, the most useful protocols are DNS Service Discovery and Cisco Discovery Protocol. DNS Service Discovery provides all the information the inventory process needs to fully identify a device: its specific Product ID (more or less a model number), the version number of both the hardware and the firmware, its MAC address, and its IP address. (CDP provides a nearly, but not identical bundle of data.) This information is readily available because devices made by Cisco's Small Business Technology Group use DNS-SD to broadcast information using our proprietary service type: <code>csco-sb</code>.</p>
<h4>A Quick Overview of DNS Service Discovery</h4>
<p>So what exactly is DNS Service Discovery? If you're like me, you became aware of DNS-SD because Apple uses it as part of <a href="http://www.apple.com/support/bonjour/" class="newpage">Bonjour</a>. Bonjour is a proprietary implementation of <a href="http://en.wikipedia.org/wiki/Zeroconf" class="newpage">Zeroconf</a>, a set of technologies marked by by <a href="http://theangryhedgehog.com/2010/12/19/three-shall-be-the-number-thou-shalt-count/" class="newpage">three</a> key network components:</p>
<ul>
<li/>Address assignment
<li/>Service discovery
<li/>Name resolution
</ul>
<p>The history of Zeroconf is a somewhat quixotic story, based around the shared idea that setting up small networks ought to be a painless and simple process. The components of Zeroconf provide a nice, vendor-agnostic way to set up networks in such a way that no consumer would ever have to manually assign an IP address, set up a DHCP server, or manually enter the address of a printer.</p>
<p>Apple has embraced this idea, with their implementation of Zeroconf called Bonjour, an Apple trademark. If you run iTunes on your Apple or Windows PC, you may well see that there are other users out there running iTunes who would be happy to share their collections with you. This happens more or less with no work on your part, and can be a really nice feature in a big office:<br />
<center></p>
<table border="0">
<tr>
<td><img src="/attachments/2011/dns-sd/Itunes.png"/></td>
</tr>
<tr>
<td><center>Sharing iTunes Libraries</center></td>
</tr>
</table>
<p></center><br />
iTunes accomplishes this sharing using DNS-SD, which is built into OS X and is configured on Windows machines as part of the iTunes installation. Every instance of iTunes that is configured to share its library uses Bonjour to advertise an instance of the <code>daap</code> service. If we look in the official roster of <a href="http://www.dns-sd.org/ServiceTypes.html" class="newpage">registered DNS Service types</a>, we find this record:</p>
<pre>
daap Digital Audio Access Protocol (iTunes)
     Amandeep Jawa <daap at apple.com>
     Defined TXT keys: txtvers, Version, iTSh Version, Machine ID,
                       Database ID, Machine Name, Password
</pre>
<p>This is a pretty simple definition - let's see what it looks like on the network. </p>
<p>On my desktop Linux sytem, I have the <a href="http://avahi.org/" class="newpage">avahi</a> utilities installed. Avahi provides a nice suite of tools used to implement DNS-SD. I'll use the <code>avahi-browse</code> command to see what these <code>daap</code> services actually look like:</p>
<pre>
mark@ubuntu:~$ avahi-browse _daap._tcp -t
+   eth0 IPv4 Itunes NAS Server on nas                      iTunes Audio Access  local
+   eth0 IPv4 Denise___s Library                            iTunes Audio Access  local
+   eth0 IPv4 Mark___s Library                              iTunes Audio Access  local
mark@ubuntu:~$
</pre>
<p>If I ask <code>avahi-browse</code> to resolve the services, it will query the service provider for the details in the advertisement. A partial output is shown below:</p>
<pre>

mark@ubuntu:~$ avahi-browse _daap._tcp -r -t
+   eth0 IPv4 Itunes NAS Server on nas                      iTunes Audio Access  local
+   eth0 IPv4 Denise___s Library                            iTunes Audio Access  local
+   eth0 IPv4 Mark___s Library                              iTunes Audio Access  local
=   eth0 IPv4 Itunes NAS Server on nas                      iTunes Audio Access  local
   hostname = [nas.local]
   address = [192.168.1.165]
   port = [3689]
   txt = ["ffid=075abcc4" "Password=false" "Version=196610" "iTSh Version=131073"
          "mtd-version=svn-1676" "Machine Name=Itunes NAS Server" "Machine ID=BE8926F6"
          "Database ID=BE8926F6" "txtvers=1"]
</pre>
<p>From this information, I know have everything I need in order to connect to to a server and start playing music. I have a hostname, IP address, and a port, all of which can be used to access the service. Finally I have a txt record that contains an aribtrary set of name/value pairs, as defined in the service definition. The us of these fields is up to the creator of the service, and in this case most of them are self-evident.</p>
<h4>Browsing the Network</h4>
<p>We use the avahi toolkit in OnPlus to browse the network for devices. It is worth doing a little exploring on my home network to see what kind of information we get out of this process.</p>
<p>To get a high-level view, I can ask <code>avahi-browse</code> to query for a special service: <code>_services._dns-sd._udp</code>. When this browse request goes out on the network, all the active nodes using DNS-SD issue records detailing the types of services they support. The result on my home network looks like this:</p>
<pre>
mark@ubuntu:~$ avahi-browse _services._dns-sd._udp -t
+   eth0 IPv4 _udisks-ssh                                   _tcp                 local
+   eth0 IPv4 _workstation                                  _tcp                 local
+   eth0 IPv4 _ir-hvac-021                                  _tcp                 local
+   eth0 IPv4 _ir-hvac-020                                  _tcp                 local
+   eth0 IPv4 _ir-hvac-000                                  _tcp                 local
+   eth0 IPv4 _pdl-datastream                               _tcp                 local
+   eth0 IPv4 _printer                                      _tcp                 local
+   eth0 IPv4 _tivo-videos                                  _tcp                 local
+   eth0 IPv4 _readynas                                     _tcp                 local
+   eth0 IPv4 _smb                                          _tcp                 local
+   eth0 IPv4 _afpovertcp                                   _tcp                 local
+   eth0 IPv4 _rsp                                          _tcp                 local
+   eth0 IPv4 _daap                                         _tcp                 local
+   eth0 IPv4 _http                                         _tcp                 local
+   eth0 IPv4 _csco-sb                                      _tcp                 local
mark@ubuntu:~$
</pre>
<p>As you can see, there are a surprising number of DNS-SD services present. On my network, an explanation for each of the services is:</p>
<table border=0">
<tr>
<td valign="top">_udisks-ssh:</td>
<td>A remote disk management tool being advertised by my Ubuntu systems</td>
</tr>
<tr>
<td valign="top">_workstation:</td>
<td>Some sort of workgroup management interface support by various Linux systems.</td>
</tr>
<tr>
<td valign="top">_ir-hvac-0xx:</td>
<td>Management interfaces on a Trane thermostat that happens to have wireless access to my network</td>
</tr>
<tr>
<td valign="top"><nobr>_pdl-datastream:</nobr></td>
<td>Printer page description language interface. This is a service that is used in Bonjour printing. Both of my networked printers support it.</td>
</tr>
<tr>
<td valign="top">_printer</td>
<td>Both of my printers use this advertisement to offer TCP port 515 up for LPR print spooling</tr>
</td>
<tr>
<td valign="top">_tivo-videos</td>
<td>My Tivo sends out this advertisement which provides a complete URL I can use to get an XML-formatted version of the <em>Now Playing</em> section of the Tivo UI.</td>
</tr>
<tr>
<td valign="top">_readynas</td>
<td>My Netgear ReadyNAS uses this unregistered service type to advertise something that can be reached on port 9. Exactly what, I don't know, but I think it might be just a way for PC users to find the NAS with RAIDar.</td>
</tr>
<tr>
<td valign="top">_smb</td>
<td>My Netgear ReadyNAS advertises its Windows shares with this service type</td>
</tr>
<tr>
<td valign="top">_afpovertcp</td>
<td>My Netgear ReadyNAS uses this registered service type to advertise its Apple File Sharing volumes</td>
</tr>
<tr>
<td valign="top">_rsp</td>
<td>I have a Firefly iTunes server running on my NAS. In addition to serving music via DAAP, it uses the Roku Server Protocol as well, presumably working with software that doesn't support iTunes protocols.</td>
</tr>
<tr>
<td valign="top">_http</td>
<td>Most of the devices on my network that are running web servers issue an HTTP advertisement, which points to that interface.</td>
</tr>
<tr>
<td valign="top">_csco-sb</td>
<td>My two Cisco SB devices advertise their presence using this service</td>
</tr>
</table>
<p>This special command to show me the services available is not actually used in OnPlus. Instead, we call <code>avahi-browse</code> with the -r and -p commands, asking it to do a full resolution on all discovered services.</p>
<h4>Cisco Devices</h4>
<p>The place where we get the most interesting results from <code>avahi-browse</code> is when we tell it to look specifically for instances of the <code>cisco-sb</code> service. That command produces output like this:</p>
<pre>
mark@ubuntu:~$ avahi-browse -r -t _csco-sb._tcp
+   eth0 IPv4 switch32026a                                  _csco-sb._tcp        local
+   eth0 IPv4 onplus005229                                  _csco-sb._tcp        local
=   eth0 IPv4 switch32026a                                  _csco-sb._tcp        local
   hostname = [sg200-26p.local]
   address = [192.168.1.168]
   port = [80]
   txt = ["hostname=sg200-26p" "serialNo=DNI1515005U" "MACAddress=44E4D932026A"
          "PIDVID=SLM2024PT V01" "fmVersion=1.1.1.8"
          "deviceDescr=26-port Gigabit PoE Smart Switch" "deviceType=Switch"
          "model=SG 200-26P"]
=   eth0 IPv4 onplus005229                                  _csco-sb._tcp        local
   hostname = [PLG1000F0AD4E005229.local]
   address = [192.168.1.167]
   port = [80]
   txt = ["accessType=http" "MDFID=Unassigned" "hostname=onplus005229"
          "serialNo=PLGF0AD4E005229" "MACAddress=F0:AD:4E:00:52:29"
          "PIDVID=Unassigned" "fmVersion=6.2.2.007" "deviceDescr=Cisco OnPlus Network Agent"
          "deviceType=Service Appliance" "model=PLG1000" "version=1.0"]
mark@ubuntu:~$
</pre>
<p>If you look at the first '=' records that is issued by <code>avahi-browse</code>, you can see that when it comes to discovery, I have really hit the jackpot. I've identified a device on my system that I can reach with a specific IP address. I have a MAC address that I can now use as a globally unique identifier. And I have the Cisco Product ID, the hardware version, and the firmware version, as well as the user-assigned host name and a friendly model name. </p>
<p>When I have information like this, it allows me to fill in the details in the topology map quite accurately. Better yet, since this is a Cisco device, the OnPlus appliance can now send it some queries to find out more network information. As an example, the switch's CAM table provides me with a list of devices and the ports they are attached to, which helps me fill in some of the details of the topology picture.</p>
<h4>Processing This Data</h4>
<p>If you are a programmer, the natural question you might be asking is how you access these service advertisements from inside your program. In the case of Cisco OnPlus, most of the code running the inventory task consists of PHP scripts. As far as I know, there are no bindings in PHP to DNS-SD services, and we elected not to try to invent that wheel.</p>
<p>Instead, we use PHP's <code>popen()</code> function to run instances of <code>avahi-browse</code>, collecting the output from the program and parsing it accordingly. We actually have three instances of the browser running at any time. Two are dedicated to Cisco-specific services, while the third looks at all other services. Even though other services might not give us as munch information as <code>csco-sb</code>, they still supply host names, MAC addresses, IP addresses, descriptions, and more, and we use whatever we can find.</p>
<p>These instances of <code>avahi-browse</code> run as independent discovery processes, collecting and storing data as it is seen on the network. The records that they collect are then used by the inventory process when it is periodically launched.</p>
<p>The DNS-SD discovery processes actually take an active role in the inventory scheme. When one of the avahi-based processes discovers a significant new device on the network, such as one advertising <code>cisco-sb</code>, it stores the data record and then triggers an early start to the inventory process. This allows OnPlus to respond to new devices on the network in something close to real-time.</p>
<h4>Advertising</h4>
<p>So let's say you decide you want to use DNS-SD for some form of network discovery. I've shown you how you can you discover network nodes that are advertising a service, but how do you actually perform that advertisment?</p>
<p>In the OnPlus Network we use the avahi package to advertise services as well as to find them. The <code>avahi-publish</code> command does the job, typically started as a daemon to run for the lifetime of the system. When executing <code>avahi-publish</code>, you will normally want to specify:</p>
<ul>
<li/>A service type
<li/>The name of the service instance
<li/>A port to access the service
<li/>Text records containing whatever name/value pairs you want to publish
</ul>
<p>One thing to note is that you can use DNS-SD to advertise things that don't necessarily map directly to a port. For example, if you just want to let everyone know of your existence, the port parameter might be irrelevant, but the name/value pairs could still be valuable.</p>
<p>As an example, I might create a photo sharing service that I advertise on the network using a service I'll call <code>PhotoMark</code>. To let everyone know about this, I execute the following command when my system starts up:</p>
<pre>
avahi-publish -s mark _PhotoMark._tcp 9999 "payload=.gif" "folder=/pictures"
</pre>
<p>Anyone on the network using DNS-SD could then see that I was advertising photo sharing with photos of type GIF accessible on my <code>pictures</code> folder. Of course, the details of the protocol are not included in the advertisement - that's outside the scope of DNS-SD.</p>
<p>Often a good choice for the service instance is to simply use the computer name:</p>
<pre>
avahi-publish -s `uname -n` _PhotoMark._tcp 9999 "payload=.gif" "folder=/pictures"
</pre>
<h4>What's Next</h4>
<p>In this article I showed you how we use DNS-SD on the Cisco OnPlus Network Agent. Mostly it involves calling the avahi command line tools and parsing their output with PHP scripts.</p>
<p>In my next post, I'll show you how to do the same thing on a Windows PC using the Apple Bonjour SDK. Unfortunately, Windows has not included DNS-SD support in the O/S, so instead of using the Win32 API to do service discovery, you will have to rely on some slightly less elegant methods. But when it comes to network discovery, functionality and interoperability trump elegance every day of the week.</p>
]]></content:encoded>
			<wfw:commentRss>http://marknelson.us/2011/09/30/dns-service-discovery/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

