Jekyll2021-12-28T17:49:07+00:00https://marknelson.us/Mark NelsonBooks, articles, and posts from 1989 to today.Mark NelsonHiding Data in the Filesystem2018-07-17T20:30:34+00:002018-07-17T20:30:34+00:00https://marknelson.us/posts/2018/07/17/hiding-data-in-the-filesystem<p>In <a href="/posts/2012/10/09/the-random-compression-challenge-turns-ten.html" target="_blank">The Random Compression Challenge Turns Ten</a> I challenged programmers to compress a notoriously random file. A winning entry
will consist of a program plus data file that when executed creates a perfect copy of
<a href="/assets/2012-10-09-the-random-compression-challenge-turns-ten/AMillionRandomDigits.bin">AMillionRandomDigits.bin</a>.
If the size of the program plus data is less than the 415,241 bytes in that file, you will have demonstrated that
you can compress it.</p>
<p>To date, no winners.</p>
<h3 id="why-the-fine-print">Why The Fine Print</h3>
<p>One of the things that I have to insist on in the competition is that you can’t hide data in the filesystem.
This can be done in various ways, and in some cases, programmers don’t even realize they are doing it. In
a nutshell, to prevent this, a winning entry with a data file should be able to run like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat data.bin | decompressor > output.bin
</code></pre></div></div>
<p>I recently had an entry from someone who compressed the data into five files. His decompressor didn’t care
what the names of those five files were, but they had to be found in a specific order. He indicated that this was
because each of the file files had a somewhat different algorithm applied against it.</p>
<p>So whether it is used or not, at a minimum this is hiding 16 bytes of data - each of the first four files has a
specific length, and that is being used by decompressor - it tells the decompressor when to stop
and switch to a new algorithm. If you were reading data from a stream, with no file
system, it would have to look like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>file-1 length | file-1 data | file-2 length | file-2 data ... | file-5 data
</code></pre></div></div>
<p>Each of those lengths are going to be four bytes long. (I’m not counting the length of file 5, because we take
it as a given that the decompressor operating on just one file has that information.)</p>
<h3 id="a-demonstration">A Demonstration</h3>
<p>Here’s a very simple algorithm that would win the competition if not for the fact that it is
violating that rule. I created a short python script that reads the random file, then splits it every time it finds
a 0 byte in the file. When writing the split file out, the zero is deleted from the end:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c">#! /usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">last</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">begin</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">end</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">file_no</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="k">while</span> <span class="n">end</span> <span class="o">!=</span> <span class="n">last</span><span class="p">:</span>
<span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="n">end</span><span class="p">]</span> <span class="o">==</span> <span class="nb">chr</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
<span class="k">print</span> <span class="n">end</span>
<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">file_no</span><span class="p">)</span><span class="o">+</span><span class="s">".bin"</span><span class="p">,</span><span class="s">"wb"</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">begin</span><span class="p">:</span><span class="n">end</span><span class="p">])</span>
<span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">file_no</span> <span class="o">=</span> <span class="n">file_no</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">begin</span> <span class="o">=</span> <span class="n">end</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">end</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">"last.bin"</span><span class="p">,</span><span class="s">"wb"</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">begin</span><span class="p">:</span><span class="n">end</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span>
<span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span></code></pre></figure>
<p>I execute this script on the random file:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat AMillionRandomDigits.bin | ./compress.py
</code></pre></div></div>
<p>The result is list 1,641 4-digit .bin files, which need to be processed in the same order
they were created. The total size of the files?</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls -l ????.bin | tr -s ' ' | cut -f 5 -d\ | awk '{s+=$1} END {print s}'
413601
</code></pre></div></div>
<p>I’ve saved 1640 bytes through this operation. Can I now decompress and get an exact copy? Here’s the script:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/usr/bin/env bash</span>
<span class="nb">echo</span> <span class="nt">-n</span> <span class="o">></span> output.bin
<span class="k">for </span>f <span class="k">in</span> ????.bin
<span class="k">do
</span><span class="nb">echo</span> <span class="nt">-ne</span> <span class="nv">$f</span><span class="s1">'\r'</span> <span class="o">></span>&2
<span class="nb">cat</span> <span class="nv">$f</span> <span class="o">>></span> output.bin
<span class="k">if</span> <span class="o">[</span> <span class="nv">$f</span> <span class="o">!=</span> <span class="s2">"last.bin"</span> <span class="o">]</span>
<span class="k">then
</span><span class="nb">echo</span> <span class="nt">-ne</span> <span class="s2">"</span><span class="se">\x</span><span class="s2">00"</span> <span class="o">>></span> output.bin
<span class="k">fi
done
</span><span class="nb">echo</span> <span class="o">></span>&2
</code></pre></div></div>
<p>This 187 byte file creates <code class="highlighter-rouge">output.bin</code>, which is an identical copy of the random file. Which means I’ve compressed
the file by 1,453 bytes! What information did I manage to conceal?</p>
<p>In this case, the answer is pretty simple. I need to restore 1,640 bytes, each with a value of 0, in the output file.
The place they go is determined by the length of each of the first 1,640 bin files - whose length is hidden
in the filesystem.</p>
<h3 id="avoiding-the-problem">Avoiding the Problem</h3>
<p>Unfortunately, it isn’t necessarily easy to determine that a program is using filesystem data - innocently or
not. As a result, the only way to really accept an entry will be if it can read its input from stdin, or perhaps
just one file with a specific name.</p>
<p>If you’re creating an entry, and you find that you need some sort of specific file structure, you probably need to
analyze your algorithm a bit to see how much leverage you are getting from the filesystem. Odds are, it’s going
to be enough to keep you from beating the challenge.</p>Mark NelsonIn The Random Compression Challenge Turns Ten I challenged programmers to compress a notoriously random file. A winning entry will consist of a program plus data file that when executed creates a perfect copy of AMillionRandomDigits.bin. If the size of the program plus data is less than the 415,241 bytes in that file, you will have demonstrated that you can compress it.Data Compression With Arithmetic Coding2014-10-19T20:30:34+00:002018-09-10T20:30:34+00:00https://marknelson.us/posts/2014/10/19/data-compression-with-arithmetic-coding<p>Arithmetic coding is a common algorithm used in both lossless and lossy data
compression algorithms.</p>
<p>It is an <i>entropy encoding</i> technique, in which the frequently seen symbols are encoded
with fewer bits than rarely seen symbols. It has some advantages over well-known techniques such as
Huffman coding. This article will describe the
<a href="https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf" target="_blank">CACM87</a>
implementation of arithmetic coding in detail, giving you a good understanding of all the details needed to implement it.</p>
<p>On a historical note, this post is an update of an
<a href="/posts/1991/02/01/arithmetic-coding-statistical-modeling-data-compression.html" target="_blank">Data Compression With Arithmetic Coding</a>
article I wrote over twenty years ago. That article was published in the print edition of Dr. Dobb’s Journal,
which meant that a lot of editing was done in order to avoid excessive page count.
In particular, that Dr. Dobb’s piece combined two topics: a description of arithmetic coding along
with a discussion of compression using PPM (Prediction by Partial Matching).</p>
<p>Because this new piece will be published on the web, space considerations are no longer a big factor,
which I hope will allow me to do justice to the details of arithmetic coding.
PPM, a worthy topic of its own, will be discussed in a later article. I hope that this new effort will be,
while annoyingly long, the thorough explanation of the subject I wanted to do in 1991.</p>
<p>I think the best way to understand arithmetic coding is to break it into two parts, and I’ll use
that idea in this article. First I will give a description of how arithmetic coding works, using
regular floating point arithmetic implemented using standard C++ data types. This allows for
a completely understandable, but slightly impractical implementation. In other words, it works,
but it can only be used to encode very short messages.</p>
<p>The second section of the article will describe an implementation in which we switch to doing a special type of
math on unbounded binary numbers. This is a somewhat mind-boggling topic in itself, so it helps if you already
understand arithmetic coding - you don’t have get hung up trying to learn two things at once.</p>
<p>To wrap up I will present working sample code written in modern C++. It won’t necessarily be the most optimized code
in the world, but it is portable and easy to add to your existing projects. It should be perfect for
learning and experimenting with this coding technique.</p>
<h2>Fundamentals</h2>
<p>The first thing to understand about arithmetic coding is what it produces. Arithmetic coding takes a
<i>message</i> (often a file) composed of <i>symbols</i> (nearly always eight-bit characters),
and converts it to a floating point number greater than or equal to zero and less than one.
This floating point number can be quite long - effectively your entire output file is one long number - which
means it is not a normal data type that you are used to using in conventional programming languages.
My implementation of the algorithm will have to create this floating point number from scratch, bit by bit,
and likewise read it in and decode it bit by bit.</p>
<p>This encoding process is done incrementally. As each character in a file is encoded, a few bits
will be added to the encoded message, so it is built up over time as the algorithm proceeds.</p>
<p>The second thing to understand about arithmetic coding is that it relies on a <i>model</i> to characterize
the symbols it is processing. The job of the model is to tell the encoder what the probability of a
character is in a given message. If the model gives an accurate probability of the characters in the message,
they will be encoded very close to optimally. If the model misrepresents the probabilities of symbols,
your encoder may actually expand a message instead of compressing it!</p>
<h3>Encoding With Floating Point Math</h3>
<p>The term <i>arithmetic coding</i> has to cover two separate processes: encoding messages and decoding them.
I’ll start by looking at the encoding process with sample C++ code that implements the algorithm in a
very limited form using C++ <code>double</code> data. The code in this first section is only useful
for exposition - don’t try to do any real compression with it.</p>
<p>To perform arithmetic encoding, we first need to define a proper model. Remember that the function of
the model is to provide probabilities of a given character in a message. The conceptual idea of an
arithmetic coding model is that each symbol will own its own unique segment of the number line of
real numbers between 0 and 1. It’s important to note that there are many different ways to model
character probabilities. Some models are static, never changing. Others are updated after every
character is processed. The only two things that matter to us are that 1) the model attempts to
accurately predict the probability a character will appear, and 2) the encoder and decoder have
identical models at all times.</p>
<p>As an example, we can start with an encoder that can only encode an alphabet of 100 different
characters. In a simple static model we will start with capital letters, then the lower case
letters. This means that the first symbol, ‘A’, will own the number line from 0 to .01, ‘B’ will
own .01 to .02, and so on. (In all cases, this is strictly a half-closed interval, so the
probability range for ‘A’ is actually >= 0 and < .01.)</p>
<p>With this model my encoder can represent the single letter ‘B’ by outputting a floating point
number that is less than .02 and greater than or equal to .01. So for example, an arithmetic
encoder that wanted to create that single letter could output .15 and be done.</p>
<p>An encoder that just outputs single characters is not much use though. To encode a string of
symbols involves a slightly more complicated process. In this process, the first character defines
a range of the number line that corresponds to the section assigned to it by the model. For the
character ‘B’, that means the message is between .01 and .02.</p>
<p>The next character in the message then further divides that existing range proportionate to its
current ownership of the number line. So some other letter that owns the very end of the number
line, from .99 to 1.0 would change the range from [.01,.02) to [.0199, .020). This progressive
subdividing of the range is just simple multiplication and addition, and is best understand with a
simple code sample. My first pass in C++, which is far from a working encoder, might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">double</span> <span class="n">high</span> <span class="o">=</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">low</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">c</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span> <span class="n">input</span> <span class="o">>></span> <span class="n">c</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o"><</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">></span> <span class="n">p</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">getProbability</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="kt">double</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span><span class="p">;</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">second</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">first</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">output</span> <span class="o"><<</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">high</span><span class="o">-</span><span class="n">low</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">;</span></code></pre></figure>
<p>After the entire message has been processed, we have a final range, [low, high). The encoder outputs
a floating point number right in the center of that range.</p>
<h3>Examining the Floating Point Prototype</h3>
<p>The first pass encoder is demonstrated in the attached project as <code>fp_proto.cpp</code>. To get
it working I also needed to define a simple model. In this case I’ve created a model that can
encode 100 characters, with each having a fixed probability of .01, starting with ‘A’ in the first
position. To keep things simple I’ve only fleshed the class out enough to encode the capital
letters from the ASCII character set:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="p">{</span>
<span class="k">static</span> <span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o"><</span><span class="kt">double</span><span class="p">,</span><span class="kt">double</span><span class="o">></span> <span class="n">getProbability</span><span class="p">(</span> <span class="kt">char</span> <span class="n">c</span> <span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">>=</span> <span class="sc">'A'</span> <span class="o">&&</span> <span class="n">c</span> <span class="o"><=</span> <span class="sc">'Z'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">make_pair</span><span class="p">(</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="sc">'A'</span><span class="p">)</span> <span class="o">*</span> <span class="mf">.01</span><span class="p">,</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="sc">'A'</span><span class="p">)</span> <span class="o">*</span> <span class="mf">.01</span> <span class="o">+</span> <span class="mf">.01</span><span class="p">);</span>
<span class="k">else</span>
<span class="k">throw</span> <span class="s">"character out of range"</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="n">model</span><span class="p">;</span></code></pre></figure>
<p>So in this probability model, ‘A’ owns the range from 0.0 to 0.01, ‘B’ from .01 to .02, ‘C’ from
.02 to .03, and so on. (Note that this is not an accurate or effective model, but its simplicity is
useful at this point.) For a representative example, I called this encoder with the string “WXYZ”.
Let’s walk through what happens in the encoder.</p>
<p>We start with <code>high</code> and <code>low</code> set to 1.0 and 0.0. The encoder calls the
model to get the probabilities for letter ‘W’, which returns the interval [0.22, 0.23) - the
range along the probability line that ‘W’ owns in this model. If you step over the next two lines,
you’ll see that <code>low</code> is now set to 0.22, and <code>high</code> is set to 0.23.</p>
<p>If you examine how this works, you’ll see that as each character is encoded, the range between
<code>high</code> and <code>low</code> becomes narrower and narrower, but <code>high</code> will
always be greater than <code>low</code>. Additionally, the value of <code>low</code> is always
increasing, and value of <code>high</code> is always decreasing. These invariants are important in
getting the algorithm to work properly.</p>
<p>So after the first character is encoded, we know that the no matter what other values are encoded,
the final number in the message will be less than .23 and greater than or equal to .22. Both
<code>low</code> and <code>high</code> will be greater than equal to 0.22 and less than .23, and
<code>low</code> will be strictly less than <code>high</code>. This means that when decoding, we
are going to be able to determine that the first character is ‘W’ no matter what happens after
this, because the final encoded number will fall into the range owned by ‘W’. The narrowing
process is roughly shown in the figure below:</p>
<center>
<img src="https://marknelson.us/assets/2014-19-10-arithmetic-coding/Figure1.png" alt="Successive narrowing of the encoder range" title="" />
<br />
<b>Successive narrowing of the encoder range</b>
</center>
<p />
<p>Let’s see how this narrowing works when we process the second character, ‘X’. The model returns a
range of [.23, .24) for this character, and the subsequent recalculation of <code>high</code> and
<code>low</code> results in and interval of [.2223, .2224). So <code>high</code> and
<code>low</code> are still inside the original range of [.22, .23), but the interval has narrowed.</p>
<p>After the final two characters are included, the output looks like:</p>
<pre>
Encoded message: 0.2223242550
</pre>
<p>I’ll talk more about how the exact value we want to output needs to be chosen, but in theory at
least, for this particular message, any floating point number in the interval
[0.22232425,0.22232426) should properly decode to the desired values.</p>
<h3>Decoding With Floating Point Math</h3>
<p>I find the encoding algorithm to be very intuitive. The decoder reverses the process, and is no
more complicated, but the steps might not seem quite as obvious. A first pass algorithm at decoding
this message would look something like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">decode</span><span class="p">(</span><span class="kt">double</span> <span class="n">message</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">double</span> <span class="n">high</span> <span class="o">=</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">low</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span>
<span class="p">{</span>
<span class="kt">double</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">getSymbol</span><span class="p">((</span><span class="n">message</span> <span class="o">-</span> <span class="n">low</span><span class="p">)</span><span class="o">/</span><span class="n">range</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">c</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'Z'</span> <span class="p">)</span>
<span class="k">return</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o"><</span><span class="kt">double</span><span class="p">,</span><span class="kt">double</span><span class="o">></span> <span class="n">p</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">getProbability</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">second</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">first</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The math in the decoder basically reverses the math from the encode side. To decode a character, the
probability model just has to find the character whose range covers the current value of the
message. When the decoder first starts up with the sample value of 0.22232425, the model sees that
the value falls between the interval owned by ‘W’: [0.22,0.23), so the model returns W. In
<code>fp_proto.cpp</code>, the decoder portion of the simple model looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">static</span> <span class="kt">char</span> <span class="nf">getSymbol</span><span class="p">(</span> <span class="kt">double</span> <span class="n">d</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">d</span> <span class="o">>=</span> <span class="mf">0.0</span> <span class="o">&&</span> <span class="n">d</span> <span class="o"><</span> <span class="mf">0.26</span><span class="p">)</span>
<span class="k">return</span> <span class="sc">'A'</span> <span class="o">+</span> <span class="k">static_cast</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">d</span><span class="o">*</span><span class="mi">100</span><span class="p">);</span>
<span class="k">else</span>
<span class="k">throw</span> <span class="s">"message out of range"</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>In the encoder, we continually narrow the range of the output value as each character is processed.
In the decoder we do the same narrowing of the portion of the message we are inspecting for the
next character. After the ‘W’ is decoded, <code>high</code> and <code>low</code> will now define an
interval of [0.22,0.23), with a range of .01. So the formula that calculates the next probability
value to be decoded, <code>(message - low)/range</code>, will be .2324255, which lands right in the
middle of of the range covered by ‘X’.</p>
<p>This narrowing continues as the characters are decoded, until the hardcoded end of message, letter ‘Z’ is reached. Success!</p>
<h3>Notes and Summary So Far</h3>
<p>The algorithm demonstrated to this point can be tested with <code>fp_proto.cpp</code> in the
attached project. You need to be very aware that we have seen so far is not really a valid or
useful encoder. It does, however, demonstrate with some accuracy the general flow of the algorithm.
The key observations so far are:</p>
<ul>
<li />Characters are encoded according to their position in the probability range [0, 1).
<li />We keep track of the current state using variables <code>high</code> and <code>low</code>. A valid output result will be any number in the range [<code>low</code>, <code>high</code>).
<li />As each character is encoded, it compresses the range [<code>low</code>, <code>high</code>) in a way that corresponds exactly to its position in the probability range.
<li />We have a few invariants that result from the math used in the algorithm. The value of <code>low</code> never decreases, the value of <code>high</code> never increases, and <code>low</code> is always less than <code>high</code>.
</ul>
<p>The big problem with this demonstration algorithm is that it depends on C++ <code>double</code>
values to hold the message state. Floating point variables have limited precision, which means that
eventually, as the range between <code>high</code> and <code>low</code> continues to narrow, the
distance between them becomes too small to represent with floating point variables. With the model
used here, you can encode 10 characters or so, but after that the algorithm won’t work.</p>
<p>The rest of this article will present a fully functional algorithm. Conceptually it will look very
much like the code already presented. The big difference is that the variables used in the encoder
and decoder, <code>high</code>, <code>low</code>, and <code>message</code>, are no longer going to
be of the C++ type <code>double</code>. Instead they will be arbitrarily long binary variables.</p>
<p>In order for our program to handle binary numbers of arbitrary length, they will be processed a
bit at a time, with bits being read in, calculations being performed, and then bits being output
and then discarded as they are no longer needed. The details of how this is done are where all of
the work comes into play in this algorithm.</p>
<p>In addition to those modifications, the rest of this article will cover a few other points that
have not been dealt with yet, including:</p>
<ul>
<li />How to deal with the end of the stream.
<li />How to choose the actual output number from the range [low, high).
<li />Why arithmetic coding is superior to Huffman coding as an entropy coder.
</ul>
<h2>Unbounded Precision Math With Integers</h2>
<p>In this, the second part of the exposition, you are going to have to wrap your head around some
unconventional mathematics. The algorithm that was described in the first part of this article is
still going to be faithfully executed, but it will no longer be implemented using variables of type
<code>double</code>. It will instead be implemented with integer variables and integer math, albeit
in interesting ways.</p>
<p>The basic concept that we implement is this: the values of <code>high</code>, <code>low</code>, and
the actual encoded message, are going to be binary numbers of unbounded length. In other words, by
the time we finish encoding the
<a href="https://www.gutenberg.org/cache/epub/2701/pg2701.txt" target="_blank">Moby Dick on Project Gutenberg</a>,
<code>high</code> and <code>low</code> will be millions of bits long, and the output value itself
will be millions of bits long. All three will still represent a number greater than or equal to 0,
and less than 1.</p>
<p>An even more interesting facet of this is that even though the three numbers in play are millions
of bits long, each time we process a character we will only do a few operations of simple integer
math - 32 bits, or perhaps 64 if you like.</p>
<h3>Number Representation</h3>
<p>Recall that in the reference version of the algorithm, low and high were initialized like this:</p>
<pre>
double high = 1.0;
double low = 0.0;
</pre>
<p>In the integer version of the algorithm, we switch to a representation like this:</p>
<pre>
unsigned int high = 0xFFFFFFFFU;
unsigned int low = 0;
</pre>
<p>Both numbers have an implied decimal point leading their values, which would mean that
<code>high</code> is actually (in hex) 0.FFFFFFFF, or in binary, 0.1111…1111, and
<code>low</code> is 0.0. The number that we output will likewise have an implied decimal point
before the first bit.</p>
<p>But this is not quite right - in the first implementation <code>high</code> was 1.0.
The value 0.FFFFFFF is close, but it is just a bit less than 1.0. How is this dealt with?</p>
<p>This is where a bit of mind-bending math comes into play. Although <code>high</code> has 32 bits in
memory, we consider it to have an infinitely long trail of binary 1’s trailing off the right end.
So it isn’t just 0.FFFFFFFF, that string of F’s (or 1’s) continues off into infinity - they are out
there, but haven’t been shifted into memory yet.</p>
<p>And while it may not be obvious, Google can help convince you that an infinite string of of 1’s
starting with 0.1 is actually equal to 1. (The short story: 2x - x = 1.0, therefore x = 1.0.)
Likewise, <code>low</code> is considered to be an infinitely long binary string, with 0’s hanging
off the end out to the last binary place.</p>
<h4>Resulting Changes to the Model</h4>
<p>Remember that the final implementation is going to be entirely implemented using integer math.
Previously, the model returned probabilities as a pair of floating point numbers representing the
range that a particular symbol owns.</p>
<p>In the updated version of the algorithm, a symbol still owns a specific range of on the number
line greater than equal to 0 and less than 1. But we are now going to represent these using a pair
of fractions: <code>upper</code>/<code>denom</code> and <code>lower</code>/<code>denom</code>.
This doesn’t actually affect our model code too much. The sample model we used in the previous
section returned, for example, .22 and .23 for character ‘W’. Now, instead, it will return
{22, 23, 100} in a structure called <code>prob</code>.</p>
<h3>The Real Encoder - First Pass</h3>
<p>Before getting into final, working code, I’m presenting some code that implements the basic
algorithm, but takes a couple of shortcuts around real problems. This not-quite-done code looks
like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">high</span> <span class="o">=</span> <span class="mh">0xFFFFFFFFU</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">low</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">c</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span> <span class="n">input</span> <span class="o">>></span> <span class="n">c</span> <span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">getProbability</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">upper</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">denominator</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">lower</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">denominator</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">high</span> <span class="o"><</span> <span class="mh">0x80000000U</span> <span class="p">)</span>
<span class="n">output_bit</span><span class="p">(</span> <span class="mi">0</span> <span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="mh">0x80000000U</span> <span class="p">)</span>
<span class="n">output_bit</span><span class="p">(</span> <span class="mi">1</span> <span class="p">);</span>
<span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The first part of this loop is conceptually and functionally identical to the floating point code
given in part 1. We take the range - that is, the difference between and <code>low</code> and
<code>high</code>, and we allocate some subset of that range to the character being encoded,
depending on the probability returned from the model. The result is that <code>low</code> gets a
little larger, and <code>high</code> gets a little smaller.</p>
<p>The second part of the code is a little more interesting. The <code>while</code> loop is new, and
what it does is new as well - the simplified floating point algorithm didn’t have anything like
this. It performs range checks on <code>low</code> and <code>high</code>, looking for situations
in which the values have the same most significant bit.</p>
<p>The first check looks to see if <code>high</code> has dropped below 0x80000000, in which case its
MSB is 0. Because we know that <code>low</code> is always less than <code>high</code>, its MSB will also be 0.
And because the two values only get closer to one another, the MSB of both values will forever be 0.</p>
<p>The other range check looks to see if <code>low</code> has increased above 0x7FFFFFFF, in
which case both it and <code>high</code> will have MSB values of 1, and will always have an MSB of 1.</p>
<p>In either of these cases, remember that we have three invariants to work with: <code>high</code>
only decreases, <code>low</code> only increases, and <code>high</code> is always greater than
<code>low</code>. So once <code>high</code> has a leading bit of 0, it will never change. Once
<code>low</code> has a leading bit of 1, it will never change. If this is the case, we can output
that bit to the output stream - we <em>know</em> what it is with 100% certainty, so let’s shift it
out to the output stream and get rid of it.</p>
<p>And after we have output the bit, we discard it. Shifting <code>low</code> and <code>high</code>
one bit to the left discards that MSB. We shift in a 1 into the least significant bit of
<code>high</code>, and a 0 into the least significant of <code>low</code>. Thus, we keep working
on 32 bits of precision by expelling the bits that no longer contribute anything to the precision
of our calculations. In this particular implementation, we just keep 32 bits in working registers,
with some additional number already sent to the output, and some other number pending for input.</p>
<p>The figure below shows how the math system now works while in the middle of some arbitrary
compression run. Even though we are using 32-bit math, the algorithm is now dealing with
arbitrarily long versions of <code>high</code> and <code>low</code>:</p>
<center>
<img src="https://marknelson.us/assets/2014-19-10-arithmetic-coding/Figure2.png" border="1" alt="The floating point values of low and high, including bits that have been emitted as well as bits that have not yet been input" title="" />
<br />
<b>The low and high values, including bits not in memory</b>
</center>
<p />
<p>As <code>low</code> and <code>high</code> converge, their matching digits are shifted out on the
left side, presumably to a file. These digits are never going to change, and are no longer needed
as part of the calculation. Likewise, both numbers have an infinite number of binary digits that
are being shifted in from the right - 1’s for <code>high</code> and 0’s for <code>low</code>.</p>
<h4>A Fatal Flaw and the Workaround</h4>
<p>In the code just presented we have a pretty reasonable way of managing the calculation that creates
an arbitrarily long number. This implementation is good, but it isn’t perfect.</p>
<p>We run into problems with this algorithm when the values of <code>low</code> and <code>high</code>
start converging on a value of 0.5, but don’t quite cross over the line that would cause bits to
be shifted out. A sequence of calculations that runs into this problem might produce values like
these:</p>
<pre>
low=7C99418B high=81A60145
low=7FF8F3E1 high=8003DFFA
low=7FFFFC6F high=80000DF4
low=7FFFFFF6 high=80000001
low=7FFFFFFF high=80000000
</pre>
<p>Our algorithm isn’t quite doing what we want here. The numbers are getting closer and closer
together, but because neither has crossed over the 0.5 divider, no bits are getting shifted out.
This process eventually leaves the values of the two numbers in a disastrous position.</p>
<p>Why is it disastrous? The initial calculation of <code>range</code> is done by subtracting
<code>low</code> from <code>high</code>. In this algorithm, <code>range</code> stands in as a proxy
for the number line. The subsequent calculation is intended to find the proper subsection of that
value for a given character. For example, in the earlier example, we might have wanted to encode a
character that has a range of [0.22,0.23). If the value of <code>range</code> is 100, then we can
clearly see that the character will subdivide that with values of 22 and 23.</p>
<p>But what happens if <code>low</code> and <code>high</code> are so close that <code>range</code>
just has value of 1? Our algorithm breaks down - it doesn’t matter what the probability of the next
character is if we are going to try to subdivide the range [0,1) - we get the same result. And if
we do that identically for every character, it means the decoder is not going to have any way to
distinguish one character from another.</p>
<p>Ultimately this can decay to the point where <code>low == high</code>, which breaks one of our key
invariants. It’s easy to work out what kind of disaster results at that point.</p>
<p>Later on in this article I’ll go over some finer points in the algorithm that show why you run into
trouble long before the two values are only 1 apart. We’ll come up with specific requirements for
the accuracy needed by the value of <code>range</code>. Even without those detailed requirements,
clearly something needs to be done here.</p>
<p>The fix to the algorithm is shown in the code below. This version of the algorithm still does the
normal output of a single bit when <code>low</code> goes above 0.5 or <code>high</code> drops
below it. But when the two values haven’t converged, it adds a check of the next most significant
bits to see if we are headed towards the problem of <i>near-convergence</i>. This will be the case
when the two most significant bits of <code>high</code> are 10 and the two most significant bits
of <code>low</code> are 01. When this is the case, we know that the two values are converging, but
we don’t yet know what the eventual output bit is going to be.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">high</span> <span class="o">=</span> <span class="mh">0xFFFFFFFFU</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">low</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">pending_bits</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">c</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span> <span class="n">input</span> <span class="o">>></span> <span class="n">c</span> <span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">getProbability</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">upper</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">denominator</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">lower</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">denominator</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">high</span> <span class="o"><</span> <span class="mh">0x80000000U</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">output_bit_plus_pending</span><span class="p">(</span> <span class="mi">0</span> <span class="p">);</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="mh">0x80000000U</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">output_bit_plus_pending</span><span class="p">(</span> <span class="mi">1</span> <span class="p">);</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="mh">0x40000000</span> <span class="o">&&</span> <span class="n">high</span> <span class="o"><</span> <span class="mh">0xC0000000U</span> <span class="p">)</span>
<span class="n">pending_bits</span><span class="o">++</span><span class="p">;</span>
<span class="n">low</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">&=</span> <span class="mh">0x7FFFFFFF</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mh">0x80000001</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">output_bit_plus_pending</span><span class="p">(</span><span class="kt">bool</span> <span class="n">bit</span><span class="p">,</span> <span class="kt">int</span> <span class="o">&</span><span class="n">pending_bits</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">output_bit</span><span class="p">(</span> <span class="n">bit</span> <span class="p">);</span>
<span class="k">while</span> <span class="p">(</span> <span class="n">pending_bits</span><span class="o">--</span> <span class="p">)</span>
<span class="n">output_bit</span><span class="p">(</span> <span class="o">!</span><span class="n">bit</span> <span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>So what do we do when we are in this near-converged state? We know that sooner or later, either
<code>high</code> is going to go below 0.5 or <code>low</code> will go above it. In the first case,
both values will have leading bits of 01, and in the second, they will both have leading bits of 10.
As this convergence increases, the leading bits will extend to either 01111… or 10000…,
with some finite number of digits extended.</p>
<p>Given this, we know that once we figure out what the first binary digit in the string is going to
be, the subsequent bits will all be the opposite. So in this new version of the algorithm, when we
are in the near-convergence state, we simply discard the second most significant bit of
<code>high</code> and <code>low</code>, shifting the remaining bits left, while retaining the MSB.
Doing that means also incrementing the <code>pending_bits</code> counter to acknowledge that we
need to deal with it when convergence finally happens. An example of how this looks when squeezing
out the converging bit in <code>low</code> is shown here below. Removing the bit from
<code>high</code> is basically identical, but of course a 1 is shifted into the LSB during the
process.</p>
<center>
<img src="https://marknelson.us/assets/2014-19-10-arithmetic-coding/Figure3.png" border="1" alt="A view of the value in high, followed by the same value with the 2nd bit erased, followed by the same value after shifting in a new LSB" title="" />
<br />
<b>The two steps of removing a bit to prevent overlow</b>
</center>
<p />
<p>The bit twiddling that makes this happen can be a bit difficult to follow, but the important thing
is that the process has to adhere to the following rules:</p>
<ol>
<li />The low 30 bits of both <code>low</code> and <code>high</code> are shifted left one position.
<li />The least significant bit of <code>low</code> gets a 0 shifted in.
<li />The least significant bit of <code>high</code> gets a 1 shifted in.
<li />The MSB of both words is unchanged - after the operation it will still be set to 1 for <code>high</code> and 0 for <code>low</code>.
</ol>
<p>The final change that ties all this together is the introduction of the new function
<code>output_bit_plus_pending()</code>. Each time that we manage this near-convergence process, we
know that another bit has been stored away - and we won’t know whether it is a one or a zero.
We keep a count of all these consecutive bits in <code>pending_bits</code>. When we finally reach
a situation where an actual MSB can be output, we do it, plus all the pending bits that have been
stored up. And of course, the pending bits will be the opposite of the bit being output.</p>
<p>This fixed up version of the code does everything we need to properly encode. The final working C++
code will have a few differences, but they are mostly just tweaks to help with flexibility.
The code shown above is more or less the final product.</p>
<h3>The Real Decoder</h3>
<p>I’ve talked about some invariants that exist in this algorithm, but one I have skipped over is
this: the values of <code>high</code> and <code>low</code> that are produced as each character is
processed in the encoder will be duplicated in the decoder. These values operate in lockstep,
right down to the least significant bit.</p>
<p>A consequence of this is that the code in the decoder ends up looking a lot like the code in the
encoder. The manipulation of <code>high</code> and <code>low</code> is effectively duplicated.
And in both the encoder and the decoder, the values of these two variables are manipulated using a
calculated <code>range</code>, along with the probability for the given character.</p>
<p>The difference between the two comes from how we get the probability. In the encoder, the character
is known because we are reading it directly from the file being processed. In the decoder, have to
determine the character by looking at value of the message we are decoding - where it falls on
the [0,1) number line. It is the job of the model to figure this out in function
<code>getChar()</code>.</p>
<p>The compressed input is read into a variable named <code>value</code>. This variable is another
one of our pseudo-infinite variables, like <code>high</code> and <code>low</code>, with the
primary difference being what is shifted into it in the LSB position. Recall that <code>high</code>
gets an infinite string of 1’s shifted in, and <code>low</code> gets an infinite string of 0’s.
<code>value</code> gets something completely different - it has the bits from the encoded message
shifted into. So at any given time, <code>value</code> contains 32 of the long string of bits that
represent the number that the encoder created. On the MSB side, bits that are no longer are used in
the calculation are shifted out of <code>value</code>, and on the LSB side, as those bits are
shifted out, new bits of the message are shifted in.</p>
<p>The resulting code looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">high</span> <span class="o">=</span> <span class="mh">0xFFFFFFFFU</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">low</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">32</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">count</span> <span class="o">=</span> <span class="p">((</span><span class="n">value</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getCount</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="n">range</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getChar</span><span class="p">(</span> <span class="n">count</span><span class="p">,</span> <span class="n">c</span> <span class="p">);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">c</span> <span class="o">==</span> <span class="mi">256</span> <span class="p">)</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">putByte</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span><span class="o">*</span><span class="n">p</span><span class="p">.</span><span class="n">high</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span><span class="o">*</span><span class="n">p</span><span class="p">.</span><span class="n">low</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="mh">0x80000000U</span> <span class="o">||</span> <span class="n">high</span> <span class="o"><</span> <span class="mh">0x80000000U</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="mh">0x40000000</span> <span class="o">&&</span> <span class="n">high</span> <span class="o"><</span> <span class="mh">0xC0000000U</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">low</span> <span class="o"><<</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">&=</span> <span class="mh">0x7FFFFFFF</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">|=</span> <span class="mh">0x80000001</span><span class="p">;</span>
<span class="n">value</span> <span class="o">-=</span> <span class="mh">0x4000000</span><span class="p">;</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>So this code looks very similar to the final encoder. The updating of the values is nearly
identical - it adds the update of <code>value</code> that updates in tandem with
<code>high</code> and <code>low</code>. The nature of the algorithm introduces another invariant:
<code>value</code> will always be greater than or equal to <code>low</code> and less than
<code>high</code>.</p>
<h2>A Sample Implementation</h2>
<p>All of this is put together in the production code included in ari.zip. (Links are at the end of
this article.) The use of templates makes this code very flexible, and should make it easy to plug
it into your own applications. All the needed code is in header files, so project inclusion is
simple.</p>
<p>In this section I’ll discuss the various components that need to be put together in order to
actually do some compression. The code package has four programs:</p>
<ul>
<li /><code>fp_proto.cpp</code>, the floating point prototype program. Useful for experimentation,
but not for real work.
<li /><code>compress.cpp</code>, which compresses a file using command line arguments.
<li /><code>decompress.cpp</code>, which decompresses a file using command line arguments.
<li /><code>tester.cpp</code>, which puts a file through a compress/decompress cycle, tests for
validity, and outputs the compression ratio.
</ul>
<p>The compression code is implemented entirely as a set of template classes in header files. These are:</p>
<ul>
<li /><code>compressor.h</code>, which completely implements the arithmetic compressor. The
compressor class is parameterized on input stream type, output stream type, and model type.
<li /><code>decompressor.h</code>, the corresponding arithmetic decompressor with the same type
parameters.
<li /><code>modelA.h</code>, a simple order-0 model that does an acceptable job of demonstrating
compression.
<li /><code>model_metrics.h</code>, a utility class that helps a model class set up some types used
by the compressor and decompressor.
<li /><code>bitio.h</code> and <code>byteio.h</code>, the streaming classes that implement
bit-oriented I/O.
</ul>
<p>This article is going to skip over the details pertaining to the bit-oriented I/O classes
implemented in <code>bitio.h</code> and <code>byteio.h</code>. The classes provided will allow you
to read or write from <code>std::iostream</code> and <code>FILE * </code> sources and destinations,
and can be pretty easily modified for other types. Details on the implementation of these classes
are in my article
<a href="/posts/2014/07/02/c-generic-programming-meets-oop-stdis_base_of.html" target="_blank">C++ Generic Programming Meets OOP</a>.
which includes a bit-oriented I/O class as an illustrative example.</p>
<p>All of the code I use requires a C++11 compiler, but could be modified to work with earlier
versions without much trouble. The makefile will build the code with g++ 4.6.3 or clang 3.4.
The solution file included will build the projects with Visual C++ 2012.</p>
<h3>model_metrics.h</h3>
<p>In the code I’ve shown you so far, I blithely assume that your architecture uses 32-bit math and
efficiently supports unsigned integer math. This is a bit of an unwanted and unneeded restriction,
so my production code will define the math parameters using templates. The way it works is that the
compressor and decompressor classes both take a <code>MODEL</code> type parameter, and they rely
on that type to provide a few typedefs and constants:</p>
<table border="1">
<tr>
<td valign="top"><code>CODE_VALUE</code></td>
<td>The integer type used to perform math. In the sample code shown so far we assumed this was
<code>unsigned int</code>, but signed and longer types are perfectly valid. </td>
</tr>
<tr>
<td valign="top"><code>MAX_CODE</code></td>
<td>The maximum value of a code - in other words the highest value that <code>high</code> can
reach. If we are using all 32 bits of an unsigned in, this would be 0xFFFFFFF, but as we
will see shortly, this will normally be quite a bit smaller than the max that can fit into
an int or unsigned int.</td>
</tr>
<tr>
<td valign="top"><code>ONE_FOURTH<br />ONE_HALF<br />THREE_FOURTHS</code></td>
<td>The three values used when testing for convergence. In the sample code these were set to
full 32-bit values of 0x40000000, 0x80000000, and 0xC0000000, but they will generally
be smaller than this. The values could be calculated from <code>MAX_CODE</code>, but we
let the model define them for convenience.</td>
</tr>
<tr>
<td valign="top"><code>struct prob</code></td>
<td>This is the structure used to return probability values from the model. It is defined in the
model because the model will know what types it wants the three values in this structure
to be. The three values are <code>low</code>, <code>high</code>, and <code>count</code>,
which define the range of a given character being encoded or decoded.</td>
</tr>
</table>
<p>The header file <code>model_metrics.h</code> contains a utility class that helps the model class
figure out what these types are. In general, the choice of <code>CODE_VALUE</code> is going to be
defined using the natural int or unsigned int type for your architecture. Calculating the value of
<code>MAX_CODE</code> requires a little bit more thinking though.</p>
<table border="0"><tr><td>
<center>
<table border="1" width="80%" cellpadding="5"><tr><td>
<b>Digression - Overflow Avoidance</b>
<br />It's a given that we are doing the math in the encoder and decoder using type
<code>CODE_VALUE</code>. We need to make sure that these three lines of code don't generate an
intermediate result that won't fit in that type:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CODE_VALUE</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">high</span> <span class="o">/</span> <span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">low</span> <span class="o">/</span> <span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">);</span></code></pre></figure>
The values of <code>high</code> and <code>low</code> will have some number of bits, which in
<code>model_metrics.h</code> we designate by <code>CODE_VALUE_BITS</code>. The maximum number of
bits that can be in the counts <code>low</code> and <code>high</code> returned from the model
in <code>struct prob</code> are defined in the header file by <code>FREQUENCY_BITS</code>. The
same header defines <code>PRECISION</code> as the maximum number of bits that can be contained in
type <code>CODE_VALUE</code>. Looking at the calculation above shows you that this expression must always be true:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">PRECISION</span> <span class="o">>=</span> <span class="n">CODE_VALUE_BITS</span> <span class="o">+</span> <span class="n">FREQUENCY_BITS</span></code></pre></figure>
On machines commonly in use today, <code>PRECISION</code> will be 31 if we use signed integers,
32 if we use unsigned. If we arbitrarily split the difference, we might decide that
<code>CODE_VALUE_BITS</code> and <code>FREQUENCY_BITS</code> could both be 16. This will avoid
overflow, but in fact it doesn't address a second constraint, discussed next.
</td></tr></table>
</center>
</td></tr></table>
<table border="0"><tr><td>
<center>
<table border="1" width="80%" cellpadding="5"><tr><td>
<b>Digression - Underflow Avoidance</b>
<br />In the decoder, we have the following code that is executed each time decode a new character:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CODE_VALUE</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">scaled_value</span> <span class="o">=</span> <span class="p">((</span><span class="n">value</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getCount</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span> <span class="p">)</span> <span class="o">/</span> <span class="n">range</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getChar</span><span class="p">(</span> <span class="n">scaled_value</span><span class="p">,</span> <span class="n">c</span> <span class="p">);</span></code></pre></figure>
<p />
The value returned by <code>m_model.getCount()</code> will be a frequency count, so it will be
represented by <code>FREQUENCY_BITS</code>. We need to make sure that there are enough possible
values of <code>scaled_value</code> so that it can be used to look up the smallest values in the
model.
<p />
Because of the invariants described earlier, we know that <code>high</code> and <code>low</code>
have to be at least <code>ONE_FOURTH</code> apart. The MSB of <code>high</code> has to be 1,
making it greater than <code>ONE_HALF</code>, and the MSB of <code>low</code> has to be 0, making
it less than <code>ONE_HALF</code>. But in addition, the special processing for near-convergence
insures that if <code>high</code> is less than <code>THREE_FOURTHS</code>, then <code>low</code>
must be less than <code>ONE_FOURTH</code>. Likewise, if <code>low</code> is greater than
<code>ONE_FOURTH</code>, then <code>high</code> must be greater than <code>THREE_FOURTHS</code>.
So the worst case for convergence between these values is represented, for example, when
<code>low</code> is <code>ONE_HALF-1</code> and <code>high</code> is <code>THREE_FOURTHS</code>.
In this case, <code>range</code> is going to be just a bit over <code>ONE_FOURTH</code>.
<p />
Now let's consider what happens when we are using 16 bits for our frequency counts. The largest
value returned by <code>m_model.getCount()</code> will be 65,535. Let's look at what might
happen if we were using just eight bits in <code>CODE_VALUE</code>, making
<code>MAX_CODE</code> 256. In the worst case, <code>high</code> might be 128 and <code>low</code>
might be 63, giving a range of 65. Because <code>value</code> has to lie between <code>low</code>
and <code>high</code> (back to invariants), this means there are only going to 65 possible values
of <code>scaled_value</code>. Because the range occupied by a given symbol can be as small as 1,
we need to be able to call <code>m_model.getChar()</code> with any value in the range [0,65536)
to be able to properly decode a rarely appearing character. Thus, <code>MAX_CODE</code> of
255 won't cut it.
<p />
In the worst case we are dividing the range occupied by <code>CODE_VALUE</code> calculations by 4,
so we are in effect chopping two bits off the range. In order to get that scaled up so it can
generate all the needed values of <code>scaled_value</code>, we end up with this prerequisite:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CODE_VALUE_BITS</span> <span class="o">>=</span> <span class="p">(</span><span class="n">FREQUENCY_BITS</span> <span class="o">+</span> <span class="mi">2</span><span class="p">)</span></code></pre></figure>
If we have a 32 bit <code>CODE_VALUE</code> type to work with, this means that a comfortable value
of <code>CODE_VALUE</code> will be 17, making <code>FREQUENCY_BITS</code> 15. In general,
we want <code>FREQUENCY_BITS</code> to be as large as possible, because this provides us with
the most accurate model of character probabilities. Values of 17 and 15 maximize
<code>FREQUENCY_BITS</code> for 32-bit unsigned integer math.
</td></tr></table>
</center>
</td></tr></table>
<h3>modelA.h</h3>
<p>I haven’t talked too much about modeling in this article. The focus is strictly on the mechanics of
arithmetic coding. We know that this depends on having an accurate model of character
probabilities, but getting deep into that topic requires another article.</p>
<p>For this article, the sample code uses a simple adaptive model, which I call <code>modelA</code>.
This model has character probabilities for 257 symbols - all possible symbols in an eight-bit
alphabet, plus one additional EOF symbol. All of the characters in the model start out with a count
of 1, meaning each has an equal chance of being seen, and that each will take the same number of
bits to encode.</p>
<p>After each character is encoded or decoded, its count is updated in the model, making it more
likely to be seen, and thus reducing the number of bits that will be required to encode it in
the future.</p>
<p>The model is implemented by using a <code>CODE_VALUE</code> array of size 258. The range for
a given character i is defined by the count at i and the count at i+1, with the total count
being found at location 257. Thus, the routine to get the probability looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">prob</span> <span class="nf">getProbability</span><span class="p">(</span><span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="p">{</span> <span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">c</span><span class="p">],</span>
<span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">c</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span>
<span class="n">cumulative_frequency</span><span class="p">[</span><span class="mi">257</span><span class="p">]</span> <span class="p">};</span>
<span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">m_frozen</span> <span class="p">)</span>
<span class="n">update</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">pacify</span><span class="p">();</span>
<span class="k">return</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Because of the way arithmetic coding works, updating the count for character i means the counts
for all characters greater than i have to be incremented as well, because adjusting the range for
one position on the number line is going to squeeze everyone to the right - the counts in
<code>cumulative_frequency</code> are in fact <i>cumulative</i>, representing the sum of all
characters less than i. This means the update function has to work its way through the whole array.
This is a lot of work, and there are various things we could do to try to make the model more
efficient. But <code>ModelA</code> is just for demonstration, and the fact that the array probably
fits neatly in cache means this update is adequate for now.</p>
<p>One additional factor we have to watch out for is that the we can’t exceed the maximum number of
frequency bits in our count. This is managed by setting a flag to freeze the model once the
maximum frequency is reached. Again, there are a number of things we could do to make this
more efficient, but for demonstration purposes this is adequate:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="kt">void</span> <span class="kr">inline</span> <span class="nf">update</span><span class="p">(</span><span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">c</span> <span class="o">+</span> <span class="mi">1</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">258</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span>
<span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">cumulative_frequency</span><span class="p">[</span><span class="mi">257</span><span class="p">]</span> <span class="o">>=</span> <span class="n">MAX_FREQ</span> <span class="p">)</span>
<span class="n">m_frozen</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The decoder has to deal with the same issues when looking up a character based on the
<code>scaled_value</code>. The code does a linear search of the array, whose only saving grace is
that it ought to be sitting in the cache:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">prob</span> <span class="nf">getChar</span><span class="p">(</span><span class="n">CODE_VALUE</span> <span class="n">scaled_value</span><span class="p">,</span> <span class="kt">int</span> <span class="o">&</span><span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">257</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">scaled_value</span> <span class="o"><</span> <span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="p">{</span><span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="n">cumulative_frequency</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span>
<span class="n">cumulative_frequency</span><span class="p">[</span><span class="mi">257</span><span class="p">]};</span>
<span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">m_frozen</span><span class="p">)</span>
<span class="n">update</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="k">return</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">logic_error</span><span class="p">(</span><span class="s">"error"</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>In a future article, the notion of how to develop efficient models will be a good topic to cover.
You won’t want to use <code>modelA</code> as part of a production compressor, but it does a great
job of letting you dig into the basics of arithmetic compression.</p>
<p>When you instantiate <code>modelA</code> you have three optional template parameters, which you
can use to tinker with the math used by the compressor and decompressor. (Of course, the compressor
and decompressor have to use the same parameters).</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">CODE_VALUE_</span> <span class="o">=</span> <span class="kt">unsigned</span> <span class="kt">int</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">CODE_VALUE_BITS_</span> <span class="o">=</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">numeric_limits</span><span class="o"><</span><span class="n">CODE_VALUE_</span><span class="o">>::</span><span class="n">digits</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">FREQUENCY_BITS_</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">numeric_limits</span><span class="o"><</span><span class="n">CODE_VALUE_</span><span class="o">>::</span><span class="n">digits</span> <span class="o">-</span> <span class="n">CODE_VALUE_BITS_</span><span class="o">></span>
<span class="k">struct</span> <span class="n">modelA</span> <span class="o">:</span> <span class="k">public</span> <span class="n">model_metrics</span><span class="o"><</span><span class="n">CODE_VALUE_</span><span class="p">,</span> <span class="n">CODE_VALUE_BITS_</span><span class="p">,</span> <span class="n">FREQUENCY_BITS_</span><span class="o">></span>
<span class="p">{</span></code></pre></figure>
<p>On 32 bit compiler, opting for all defaults will result in you using 17 bits for
<code>CODE_VALUE</code> calculations, and 15 bits for frequency counts. Static assertions in the
<code>model_metrics</code> class check to insure that your selections will result in correct
compression.</p>
<h3>compressor.h</h3>
<p>This header file contains all the code that implements the arithmetic compressor. The class that
does the compression is a template class, parameterized on the input and output classes, as well as
the model. This allows you to use the same compression code with different types of I/O in as
efficient a manner as possible.</p>
<p>The compressor engine is a simple C++ object that has an overloaded <code>operator()()</code>,
so the normal usage is to instantiate the engine then call it with input, output, and model
parameters. Because of the way that C++ manages template instantiation, the easiest way to actually
use the engine is via a convenience function, <code>compress()</code>, that takes care of those
details, and can be called without template parameters, as in:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">ifstream</span> <span class="n">input</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">std</span><span class="o">::</span><span class="n">ifstream</span><span class="o">::</span><span class="n">binary</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">ofstream</span> <span class="n">output</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">std</span><span class="o">::</span><span class="n">ofstream</span><span class="o">::</span><span class="n">binary</span><span class="p">);</span>
<span class="n">modelA</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">14</span><span class="o">></span> <span class="n">cmodel</span><span class="p">;</span>
<span class="n">compress</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">cmodel</span><span class="p">);</span></code></pre></figure>
<p>You can see the (simple) implementation of the convenience function in the attached source.</p>
<p>The <code>operator()()</code> is where all the work happens, and it should look very similar to the
trial code shown earlier in this article. The difference is that what is shown here is fully fleshed
out, with all the details taken care of:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">operator</span><span class="p">()()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">pending_bits</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">low</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">high</span> <span class="o">=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">MAX_CODE</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">c</span> <span class="o">=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">getByte</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">c</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="mi">256</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getProbability</span><span class="p">(</span> <span class="n">c</span> <span class="p">);</span>
<span class="n">CODE_VALUE</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">high</span> <span class="o">/</span> <span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">low</span> <span class="o">/</span> <span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">);</span>
<span class="c1">//</span>
<span class="c1">// On each pass there are six possible configurations of high/low,</span>
<span class="c1">// each of which has its own set of actions. When high or low</span>
<span class="c1">// is converging, we output their MSB and upshift high and low.</span>
<span class="c1">// When they are in a near-convergent state, we upshift over the</span>
<span class="c1">// next-to-MSB, increment the pending count, leave the MSB intact,</span>
<span class="c1">// and don't output anything. If we are not converging, we do</span>
<span class="c1">// no shifting and no output.</span>
<span class="c1">// high: 0xxx, low anything : converging (output 0)</span>
<span class="c1">// low: 1xxx, high anything : converging (output 1)</span>
<span class="c1">// high: 10xxx, low: 01xxx : near converging</span>
<span class="c1">// high: 11xxx, low: 01xxx : not converging</span>
<span class="c1">// high: 11xxx, low: 00xxx : not converging</span>
<span class="c1">// high: 10xxx, low: 00xxx : not converging</span>
<span class="c1">//</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">high</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span> <span class="p">)</span>
<span class="n">put_bit_plus_pending</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pending_bits</span><span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span> <span class="p">)</span>
<span class="n">put_bit_plus_pending</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">pending_bits</span><span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span> <span class="o">&&</span> <span class="n">high</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">THREE_FOURTHS</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">pending_bits</span><span class="o">++</span><span class="p">;</span>
<span class="n">low</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span><span class="p">;</span>
<span class="n">high</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span><span class="o">++</span><span class="p">;</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o">&=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">MAX_CODE</span><span class="p">;</span>
<span class="n">low</span> <span class="o">&=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">MAX_CODE</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">c</span> <span class="o">==</span> <span class="mi">256</span> <span class="p">)</span> <span class="c1">//256 is the special EOF code</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pending_bits</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span> <span class="p">)</span>
<span class="n">put_bit_plus_pending</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pending_bits</span><span class="p">);</span>
<span class="k">else</span>
<span class="n">put_bit_plus_pending</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">pending_bits</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">put_bit_plus_pending</span><span class="p">(</span><span class="kt">bool</span> <span class="n">bit</span><span class="p">,</span> <span class="kt">int</span> <span class="o">&</span><span class="n">pending_bits</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">put_bit</span><span class="p">(</span><span class="n">bit</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">pending_bits</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">put_bit</span><span class="p">(</span><span class="o">!</span><span class="n">bit</span><span class="p">);</span>
<span class="n">pending_bits</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<table border="0"><tr><td>
<center>
<table border="1" width="80%" cellpadding="5"><tr><td>
<b>Digression - Internal End of File</b>
<p />The output stream created by this encoder depends on a special EOF character to indicate that
it is complete. That character, 256, is the last symbol encoded and output to the stream. When the
decoder reads a symbol of 256, it stops decoding and storing symbols and knows it is done.
<p />
This method of terminating a stream works pretty well in most scenarios. There are some interesting
alternatives you can take if you are compressing individual files or other data types (such as
network packets) that have a system-imposed EOF. Since a file already has an EOF in place, we are
wasting some information by encoding our own EOF symbol.
<p />
David Scott has done a lot of work in this area, and some of his results are
<a href="http://bijective.dogma.net/" target="_blank">here</a>.
His general technique for eliminating internal EOF signaling in any compression algorithm is to
first make the algorithm
<a href="https://en.wikipedia.org/wiki/Bijection" target="_blank">bijective</a>.
When a compression algorithm is bijective, it means that <i>any file</i> will decompress properly
to some other file. This is generally not the case for most compressors - there are usually
illegal files that will break the decompressor, and the code I have presented here definitely has
that problem.
<p />
The next step is to ensure that when the input file reaches an EOF, we can terminate the
compressed file at that point with a valid sequence of bytes that ends on a byte boundary.
This means that when the decompressor sees an EOF on input, it knows two things. First, it knows
that it has properly decoded and output all symbols up to this point. Second, that it is done.
<p />
Depending on an external EOF to terminate a compressed stream won't always work - if you are
reading data from some sort of stream that doesn't have external termination, you will need an
internal EOF marker, as used here, to delineate the end of the compressed stream.
</td></tr></table>
</center>
</td></tr></table>
<table border="0"><tr><td>
<center>
<table border="1" width="80%" cellpadding="5"><tr><td>
<b>Digression - Ending the Encoded Value</b>
<p />At some point when <code>c==256</code>, <code>high</code> and <code>low</code> will stop
converging, and we will break out of the main encoding loop - we are more or less done.
<p />
The question at this point is: how many more bits do we need to output to ensure that the decoder
can decode the last symbol(s) properly?
The easiest way to do this might be to just output <code>CODE_VALUE_BITS</code> of <code>low</code>,
ensuring that all the precision we have left is used.
But we don't actually need this much precision. All we need to guarantee is that when the decoder
is reading in <code>value</code>, the final bits ensure that <code>low <= value < high</code>.
Because of the way the encoding loop works, we know that when we break out of it, <code>high</code>
and <code>low</code> are in just one of three states:
<table border="0">
<tr><th>high</th><th>low</th></tr>
<tr><td>11xxx</td><td>01yyy</td></tr>
<tr><td>11xxx</td><td>00yyy</td></tr>
<tr><td>10xxx</td><td>00yyy</td></tr>
</table>
Examining this, we can see that if <code>low</code> starts with <code>00</code>, we can write the
bits <code>01</code> and be satisfied that any values for the remaining bits will result in
<code>value</code> being greater than <code>low</code> and less than <code>high</code>.
If <code>low</code> starts with <code>01</code>, we can write the bits <code>10</code> and again be
ensured that the desired condition holds.
Because there may be additional pending bits left over from the encoding loop, we write <code>01</code> or <code>10</code> by calling <code>put_bit_plus_pending()</code> with a <code>0</code> or <code>1</code>, after incrementing <code>pending_bits</code>. That incremented value of <code>pending_bits</code> ensures that if a <code>0</code> is written, it will be followed by a <code>1</code>, and vice versa.
So just two bits (plus any pending) are all that are needed to properly terminate the bit stream.
</td></tr></table>
</center>
</td></tr></table>
<h3>decompressor.h</h3>
<p>The decompressor code mirrors the compressor code. The engine is a template class that is
parameterized on the input and output stream types, and the model type. A convenience function
called <code>decompress()</code> makes it easy to instantiate the engine and then decompress
without needing to declare template parameters:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">ifstream</span> <span class="n">input</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">std</span><span class="o">::</span><span class="n">ifstream</span><span class="o">::</span><span class="n">binary</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">ofstream</span> <span class="n">output</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">std</span><span class="o">::</span><span class="n">ofstream</span><span class="o">::</span><span class="n">binary</span><span class="p">);</span>
<span class="n">modelA</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">14</span><span class="o">></span> <span class="n">cmodel</span><span class="p">;</span>
<span class="n">decompress</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">cmodel</span><span class="p">);</span></code></pre></figure>
<p>The convenience function is in the attached source package.</p>
<p>The actual operator code is shown here. This is very much like the sample code shown earlier, with
all the additional loose ends tied up so that this is production-ready:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">operator</span><span class="p">()()</span>
<span class="p">{</span>
<span class="n">CODE_VALUE</span> <span class="n">high</span> <span class="o">=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">MAX_CODE</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">low</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">CODE_VALUE_BITS</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">CODE_VALUE</span> <span class="n">range</span> <span class="o">=</span> <span class="n">high</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">CODE_VALUE</span> <span class="n">scaled_value</span> <span class="o">=</span> <span class="p">((</span><span class="n">value</span> <span class="o">-</span> <span class="n">low</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getCount</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span> <span class="p">)</span> <span class="o">/</span> <span class="n">range</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
<span class="n">prob</span> <span class="n">p</span> <span class="o">=</span> <span class="n">m_model</span><span class="p">.</span><span class="n">getChar</span><span class="p">(</span> <span class="n">scaled_value</span><span class="p">,</span> <span class="n">c</span> <span class="p">);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">c</span> <span class="o">==</span> <span class="mi">256</span> <span class="p">)</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">putByte</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="n">high</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span><span class="o">*</span><span class="n">p</span><span class="p">.</span><span class="n">high</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">count</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="n">low</span> <span class="o">=</span> <span class="n">low</span> <span class="o">+</span> <span class="p">(</span><span class="n">range</span><span class="o">*</span><span class="n">p</span><span class="p">.</span><span class="n">low</span><span class="p">)</span><span class="o">/</span><span class="n">p</span><span class="p">.</span><span class="n">count</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span> <span class="p">;</span> <span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">high</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span> <span class="p">)</span> <span class="p">{</span>
<span class="c1">//do nothing, bit is a zero</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span><span class="p">;</span> <span class="c1">//subtract one half from all three code values</span>
<span class="n">low</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span><span class="p">;</span>
<span class="n">high</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_HALF</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span> <span class="n">low</span> <span class="o">>=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span> <span class="o">&&</span> <span class="n">high</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">THREE_FOURTHS</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span><span class="p">;</span>
<span class="n">low</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span><span class="p">;</span>
<span class="n">high</span> <span class="o">-=</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">ONE_FOURTH</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="n">low</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">high</span><span class="o">++</span><span class="p">;</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<table border="0"><tr><td>
<center>
<table border="1" width="80%" cellpadding="5"><tr><td>
<b>Digression - Priming the Input Pump</b>
<p />
One of the implementation problems when performing arithmetic coding is the management of the end
of the stream. An example of this problem can be imagined if we have a best-case scenario in which
we encode an entire file using just two or three bits. The actual compressed file will of course
need to have one byte, but that's it.
<p />
We run into a problem in the decode, because we need to be able to fill the initial value of
<code>value</code> with the appropriate number of bits when decoding. A typical number would be 17,
which would require that we read three bits into <code>value</code> before we start encoding.
Something like this is executed when the encoder starts:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="k">for</span> <span class="p">(</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">CODE_VALUE_BITS</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o"><<=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">value</span> <span class="o">+=</span> <span class="n">m_input</span><span class="p">.</span><span class="n">get_bit</span><span class="p">()</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
For every eight bits read via <code>get_bit()</code>, there will be one call to read a byte from
the underlying file or other input stream.
<p />
We could just ignore EOF conditions and return 0xFF or 0x00 whenever reading a byte. This would
work pretty well as long as we properly detect the end of stream in our encoded stream. But in the
case of encoder error, or file corruption, it could result in the decoder running forever, reading
in bogus values, decoding them to other bogus values, and writing them to output.
<p />
Things would be much better if this error condition was detected. What we would really like is for
the attempt to read past the end of file to generate an error.
<p />
To accomplish this, the bit-oriented I/O class I use for input has a specialized constructor that
asks for the number of bits that will be used in a <code>CODE_VALUE</code>. When constructing it
in the convenience function, I pass this along:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">input_bits</span><span class="o"><</span><span class="n">INPUT_CLASS</span><span class="o">></span> <span class="n">in</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">MODEL</span><span class="o">::</span><span class="n">CODE_VALUE_BITS</span><span class="p">);</span></code></pre></figure>
This value tells the input class that I may need to read that many bits past the EOF, but no more.
When constructed, I store that number in the input class member variable <code>m_CodeValueBits</code>.
The code that reads in bytes when they are needed now has processing that looks like this:
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">m_CurrentByte</span> <span class="o">=</span> <span class="n">m_Input</span><span class="p">.</span><span class="n">getByte</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">m_CurrentByte</span> <span class="o"><</span> <span class="mi">0</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">m_CodeValueBits</span> <span class="o"><=</span> <span class="mi">0</span> <span class="p">)</span>
<span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">logic_error</span><span class="p">(</span><span class="s">"EOF on input"</span><span class="p">);</span>
<span class="k">else</span>
<span class="n">m_CodeValueBits</span> <span class="o">-=</span> <span class="mi">8</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
This gives the error checking we need while still allowing a few reads past the end of file.
</td></tr></table>
</center>
</td></tr></table>
<h4>Implementation Notes - Using Exceptions for Fatal Errors</h4>
<p>Using C++ exceptions properly can be a really difficult task. Ensuring that all possible paths
through stack unwinding leave your program in a correct state is not for the faint of heart. This
is less of a problem when you are using exceptions strictly as a fatal error mechanism, as I discuss
<a href="/posts/2007/11/13/no-exceptions.html" target="_blank">here</a>.</p>
<p>The demo code I use here throws exceptions in response to the following error conditions:</p>
<ul>
<li />EOF on the input stream.
<li />A request for a character from the model with a <code>scaled_value</code> larger than the
maximum value currently defined - this is a logic error that should never happen.
</ul>
<p>Tightening up this code would give some other places that this would be useful - for example
failures when writing output.</p>
<p>Propagating errors this way is efficient, and helps keep the code clean - no checking for failures
from the model or I/O objects, and it provides a very consistent way to implement error handling
when you write new classes for modeling or I/O.</p>
<p>To catch these errors, the normal template for compressing or decompressing is going to look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">try</span> <span class="p">{</span>
<span class="c1">//</span>
<span class="c1">// set up I/O and model</span>
<span class="c1">// then compress or decompress</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//the success path</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">exception</span> <span class="o">&</span><span class="n">ex</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="s">"Failed with exception: "</span> <span class="o"><<</span> <span class="n">ex</span><span class="p">.</span><span class="n">what</span><span class="p">()</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">255</span><span class="p">;</span> <span class="c1">//failure path</span></code></pre></figure>
<h4>Implementation Notes - Building the Demo Programs</h4>
<p>There are four executables that can be built from the attached source package:</p>
<table border="0">
<tr valign="top">
<td><b>fp_proto:</b></td>
<td>The floating point prototype code. This code is useful to demonstrate the basics of how
arithmetic coding works, but isn't much good for real work, with reasons outlined in
the article.</td>
</tr>
<tr valign="top">
<td><b>compress:</b></td>
<td>A compressor that accepts an input and output file name on the command line, then compresses
using a simple order-0 model. You should see modestly good compression with this file. To get
superior compression requires some work on more advanced modeling.</td>
</tr>
<tr valign="top">
<td><b>decompress:</b></td>
<td>A decompressor that accepts an input and output file name on the command line. This will
work properly with files created by the correspond compress program.</td>
</tr>
<tr valign="top">
<td><b>tester:</b></td>
<td>A test program that accepts a single file name on the command line. It executes a compress
to a temporary file, then a decompress, then compares the result with the original file. The
app exits with 0 if the comparison succeeds, 255 in the event of a failure</td>
</tr>
</table>
<p>If you are working on a Windows box, you can build all four applications from the enclosed
<code>ari.sln</code> solution file. This should work with Visual C++ 12 or later.</p>
<p>If you are working on Linux, you can build all four files with the attached makefile, using:</p>
<pre>
make all
</pre>
<p>By default this will use g++ to compile, but if you change one option in the file you can easily
switch to clang. The code has been tested with g++ version 4.6.3, and clang++ version 3.4.</p>
<p>The code uses a few C++11 features, so if you try to build the projects with earlier versions of
any of these compilers you may run into some problems - mostly with the I/O code. It should be
reasonably easy to strip out some of the functionality and create classes that will support either
<code>iostreams</code> or <code>FILE * </code> objects.</p>
<h4>Implementation Notes - Testing</h4>
<p>The <code>tester</code> program provides a good way to exercise the code. If you are on a Linux
system, you create a directory called <code>corpus</code>, populate it with a tree of as many files
as you like, then execute <code>make test</code>, which will run the tester program against all
files. If a failure occurs, the process should abort with message, allowing you to see which file
failed:</p>
<pre>
mrn@mrn-ubuntu-12:$ make test
find corpus -type f -print0 | xargs -0r -n 1 ./tester
compressing corpus/about/services/trackback/index.html... 5.41378
compressing corpus/about/services/feed/index.html... 5.71502
compressing corpus/about/services/index.html... 5.41378
compressing corpus/about/trackback/index.html... 5.39995
compressing corpus/about/feed/index.html... 5.25144
compressing corpus/about/serial1/trackback/index.html... 5.43231
compressing corpus/about/serial1/feed/index.html... 5.32922
compressing corpus/about/serial1/index.html... 5.43231
compressing corpus/about/tdcb/trackback/index.html... 5.37245
</pre>
<p>The <code>tester</code> app also prints out the number of bits/byte that was achieved by the
compressor, that output can be collected to provide stats.</p>
<p>It’s a little harder to do this with a one-liner under Windows, so I am afraid I don’t have an
easy bulk test option.</p>
<h4>Implementation Notes - Logging</h4>
<p>When an arithmetic compressor goes bad, it can be exceedingly difficult to determine what went wrong.</p>
<p>There are two common sources of error. The encoder itself can go awry, some error with bit
twiddling, or failure to maintain the invariants that have been discussed ad nauseum here.</p>
<p>The second possible source of error is in the model itself. There are a few things that can go
wrong here, but the most common error is a loss of synch between the encoder and decoder models.
For things to work properly, both models have to operate in perfect lockstep.</p>
<p>Some low level logging can be turned on in the encoder by building with the <code>LOG</code>
constant defined. You can do this by editing the makefile and adding <code>-DLOG</code> to the
compiler command line, or defining it in the <i>C++|Preprocessor|Preprocessor Definitions</i> area
of the project properties for Windows builds.</p>
<p>Once you have the projects reconfigured, you can do a clean and rebuild of the projects. From that
point on, the compressor will create a log file called <code>compressor.log</code>, and the
decompressor will create a file called <code>decompressor.log</code>. These log files will output
the state of the compressor or decompressor as each character is encoded or decoded, with output like this:</p>
<pre>
0x2f(/) 0x0 0x1ffff => 0x5da2 0x5f9f
0x2f(/) 0xd100 0x1cfff => 0xff74 0x1016d
0x0d 0xba00 0x1b6ff => 0xc6b2 0xc7ab
0x0a 0xb200 0x1abff => 0xbb9d 0xbc92
0x2f(/) 0x9d00 0x192ff => 0xcb2f 0xce01
0x2f(/) 0xcbc0 0x1807f => 0xed8d 0xf04f
0x20 0x6340 0x113ff => 0x7a19 0x7ac4
0x20 0x3200 0x189ff => 0x5e4d 0x60e7
0x42(B) 0x2680 0x173ff => 0x83a0 0x84e2
0x57(W) 0xa000 0x1e2ff => 0x11492 0x115c8
0x54(T) 0x9200 0x1c8ff => 0xfe53 0xff7c
0x2e(.) 0x5300 0x17cff => 0x8a98 0x8bb4
0x43(C) 0x9800 0x1b4ff => 0xe994 0xeaa2
0x50(P) 0x9400 0x1a2ff => 0xef56 0xf056
0x50(P) 0x5600 0x156ff => 0xac4c 0xae31
</pre>
<p>The data you see there is the character being encoded, both in hex and text representation (if
printable), followed by the values of <code>low</code> and <code>high</code> before and after the
character is processed. If you diff the two log files, the only difference between the two should
be in the last character position, because the decompressor doesn’t update its state when it sees
an EOF:</p>
<pre>
mrn@mrn-ubuntu-12:$ diff compressor.log decompressor.log
10587c10587
< 0x100 0x27a0 0x10cff => 0x10cfa 0x10cff
---
> 0x100 0x27a0 0x10cff
mrn@mrn-ubuntu-12:$
</pre>
<p>Any other differences indicate an error. You can use the line number of the position to determine
exactly where things went awry.</p>
<h2>EOF</h2>
<p>By itself, an entropy encoder is not a magic bullet. A good compressor needs sophisticated
modeling to back it up. With that good model, an arithmetic encoder will give you excellent results,
and should outperform Huffman or other older and more revered coders. There is an entire class of
coders referred to as
<a href="https://en.wikipedia.org/wiki/Range_encoding" target="_blank">Range encoders</a>
that use the ideas here, tailored for greater efficiency.</p>
<p>The computational requirements of an arithmetic encoder are definitely going to be higher than
with something like Huffman, but modern CPU architectures should be able to minimize the impact
this has on your program. Arithmetic encoders are particularly well suited for adaptive models
when compared to Huffman coding.</p>
<p>Source is attached
<a href="https://marknelson.us/assets/2014-19-10-arithmetic-coding/ari.zip" target="_blank">here</a>.</p>
<p>Your comments and corrections are welcome, and will help improve this post as time goes on.</p>
<p>Thanks for reading.</p>
<ul>
<li>Mark</li>
</ul>Mark NelsonArithmetic coding is a common algorithm used in both lossless and lossy data compression algorithms.Highlights of ISO C++142014-09-11T07:00:00+00:002018-08-02T08:00:00+00:00https://marknelson.us/posts/2014/09/11/highlights-of-iso-c14<p>Voting on the C++14 standard was completed in August, and all that remains before we can say it is
officially complete is publication by the ISO. In this article I will visit the high points of
the new standard, demonstrating how the upcoming changes will affect the way you program,
particularly when using the idioms and paradigms of
<a href="https://msdn.microsoft.com/en-us/library/hh279654.aspx?f=255&MSPPError=-2147217396" target="_blank">Modern C++</a>.</p>
<p>The committee seems intent on keeping the standards process in a higher gear than in the past.
This means that C++14, having had just three years since the last standard, is a somewhat
constrained release. Far from being disappointing, this is a boon for programmers, because it means
implementers have been able to push out compliance with the new features in real time. Yes, you can
start using C++14 features today. Nearly all if you are flexible on your tool chain.</p>
<p>At this point you can get a free copy of the draft proposal <a target="_blank" href="http://open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf">here</a>. Unfortunately, when the final standard is published, ISO will have it paywalled.</p>
<h2>Conformance</h2>
<p>I think that shortening the time-frame between releases is working to help compiler writers keep
up with the language changes in something closer to real-time. With just three years between
releases, there are fewer changes to adjust to.</p>
<p>The examples in this article were mostly tested with clang 3.4, which has
<a href="https://clang.llvm.org/cxx_status.html" target="_blank">great coverage</a>
of C++14 features. g++ has a
<a href="https://gcc.gnu.org/projects/cxx-status.html#cxx14" target="_blank">somewhat smaller</a>
list of features covered, and Visual C++ seems to be
<a href="https://blogs.msdn.microsoft.com/vcblog/2014/08/21/c1114-features-in-visual-studio-14-ctp3/" target="_blank">trailing the pack</a>.</p>
<p>Interestingly, I actually did all my clang development on a ubuntu-14.04 LTS system hosted on
Microsoft Azure, using my monthly MSDN credits. I’m never sure how much those free credits really
mean, but I haven’t used them up yet, and it’s nice to experiment on a well-managed VM.</p>
<p>My first step after creating the VM and using puTTY for an SSH connection was to install Dropbox
using the generic installer from the Dropbox site:</p>
<pre>
cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf -
~/.dropbox-dist/dropboxd
</pre>
<p>This gives me a URL I can use to authenticate this Dropbox account. I go to my browser to enable
this VM as a legitimate user of my Dropbox account.</p>
<p>After that I want to use the Dropbox CLI script to control it, so I kill <code>dropboxd</code> with CTRL-C, then:</p>
<pre>
wget -O ~/dropbox.py https://www.dropbox.com/download?dl=packages/dropbox.py
chmod u+x ~/dropbox.py
~/dropbox.py start
</pre>
<p>I can now check up on the status with <code>~/dropbox.py status</code>, as my source files are linked to this computer.</p>
<p>Meanwhile, I need to install the components of clang that are going to give a 3.4 compatible workstation:</p>
<pre>
sudo apt-get update
sudo apt-get install clang-3.4
sudo apt-get install libc++-dev
sudo apt-get install binutils
# create helloworld.cpp then test
clang++ -stdlib=libc++ -std=c++1y helloworld.cpp
</pre>
<p>After installing clang-3.4, libc++, and binutils, a typical example is built using this command line on Linux:</p>
<pre>
clang++ -std=c++1y -stdlib=libc++ example.cpp
</pre>
<h2>C++14 changes of note</h2>
<p>What follows are descriptions of the C++14 changes that have significant impact in your life,
along with working code and discussions of when and why you would employ them.</p>
<h3>Return type deduction</h3>
<p>The continuing expansion of <code>auto</code> in the language is an interesting development.
C++ itself continues to be typesafe, but the mechanics of type safety are increasingly being
performed by the compiler instead of the programmer.</p>
<p>In C++11, programmers starting using <code>auto</code> for declarations. This was keenly
appreciated for things like iterator creation, when the fully qualified type name might be
horrendous. Newly minted C++ code was much easier to read:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span> <span class="k">auto</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">collection</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span> <span class="p">;</span> <span class="p">...</span></code></pre></figure>
<p>This code is still completely typesafe - the compiler knows what type <code>begin()</code> returns
in that context, so there is no question about what type <code>ii</code> is, and that will be
checked every place it is used.</p>
<p>In C++14, the use of <code>auto</code> was expanded in a couple of ways. One that makes perfect
sense is that of <i>return type deduction</i>. If I write a line of code like this inside a function:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">return</span> <span class="mf">1.4</span><span class="p">;</span></code></pre></figure>
<p>it is obvious to both me and the compiler that the function is returning a double. So in C++14, I
can define the function return type as <code>auto</code> instead of double:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="nf">getvalue</span><span class="p">()</span> <span class="p">{</span></code></pre></figure>
<p>The
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3638.html" target="_blank">details</a>
of this new feature are pretty easy to understand. For example, if a function has multiple return
paths, they need to have the same type. Code like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">auto</span> <span class="nf">f</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">0</span> <span class="p">)</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">else</span>
<span class="k">return</span> <span class="mf">2.0</span>
<span class="p">}</span></code></pre></figure>
<p>might seem like it should obviously have a deduced return type of <code>double</code>, but the
standard prohibits this ambiguity, and the compiler property complains:</p>
<pre>
error_01.cpp:6:5: error: 'auto' in return type deduced as 'double' here but deduced as 'int' in
earlier return statement
return 2.0
^
1 error generated.
</pre>
<p>There are a couple of good reasons why deducing the return type is a plus for your C++ programs.
First, there are times when you have to return a fairly complex type, such as an iterator, perhaps
when searching into a standard library container. The <code>auto</code> return type makes the
function easier to write properly, and easier to read.</p>
<p>A second, maybe less obvious reason, is that using an <code>auto</code> return type enhances your
ability to refactor. As an example, consider this program:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <iostream>
#include <vector>
#include <string>
</span>
<span class="k">struct</span> <span class="n">record</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">name</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">id</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">auto</span> <span class="nf">find_id</span><span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">record</span><span class="o">></span> <span class="o">&</span><span class="n">people</span><span class="p">,</span>
<span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">auto</span> <span class="n">match_name</span> <span class="o">=</span> <span class="p">[</span><span class="o">&</span><span class="n">name</span><span class="p">](</span><span class="k">const</span> <span class="n">record</span><span class="o">&</span> <span class="n">r</span><span class="p">)</span> <span class="o">-></span> <span class="kt">bool</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">r</span><span class="p">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">name</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">auto</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">find_if</span><span class="p">(</span><span class="n">people</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">people</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span> <span class="n">match_name</span> <span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ii</span> <span class="o">==</span> <span class="n">people</span><span class="p">.</span><span class="n">end</span><span class="p">())</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">else</span>
<span class="k">return</span> <span class="n">ii</span><span class="o">-></span><span class="n">id</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">record</span><span class="o">></span> <span class="n">roster</span> <span class="o">=</span> <span class="p">{</span> <span class="p">{</span><span class="s">"mark"</span><span class="p">,</span><span class="mi">1</span><span class="p">},</span>
<span class="p">{</span><span class="s">"bill"</span><span class="p">,</span><span class="mi">2</span><span class="p">},</span>
<span class="p">{</span><span class="s">"ted"</span><span class="p">,</span><span class="mi">3</span><span class="p">}};</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">find_id</span><span class="p">(</span><span class="n">roster</span><span class="p">,</span><span class="s">"bill"</span><span class="p">)</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="n">find_id</span><span class="p">(</span><span class="n">roster</span><span class="p">,</span><span class="s">"ron"</span><span class="p">)</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>In this example, I’m not saving many brain cells by having <code>find_id()</code> return
<code>auto</code> instead of <code>int</code>. But consider what happens if I decide that I want to
refactor my <code>record</code> structure. Instead of using an integral type to identify the
person in the <code>record</code> object, maybe I have a new <code>GUID</code> type:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="n">record</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">name</span><span class="p">;</span>
<span class="n">GUID</span> <span class="n">id</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Making that change to the <code>record</code> object will cause a series of cascading changes in
things like the return types of functions. But if my function uses automatic return type
deduction, the compiler will silently make the change for me.</p>
<p>Any C++ programmer who has worked on a large project is familiar with this issue. Making a change
to a single data structure can cause a seemingly endless series of iterations through the code
base, changing variable, parameter, and return types. The increased use of <code>auto</code> does a lot to
cut through this bookkeeping.</p>
<hr />
<p><strong>Note</strong>:</p>
<p>In the example above, and in the rest of this article, I create and use a named lambda. I suspect
that most users of lambdas with functions like <code>std::find_if()</code> will define their
lambdas as anonymous inline objects, which is a very convenient style. Due to limited page width,
I think it is a little easier to read code in your browser when lambdas are defined apart from their usage.</p>
<p>So this is not necessarily a style you should emulate, you should just appreciate that it is
somewhat easier to read. In particular, it will be much easier if you are light on lambda experience.</p>
<hr />
<p />
<p>An immediate consequence of using <code>auto</code> as a return type is the reality of its
doppelganger, <code>decltype(auto)</code> and the rules it will follow for type deduction. You can
now use it to capture type information automatically, as in this fragment:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">Container</span><span class="o">></span>
<span class="k">struct</span> <span class="n">finder</span> <span class="p">{</span>
<span class="k">static</span> <span class="k">decltype</span><span class="p">(</span><span class="n">Container</span><span class="o">::</span><span class="n">find</span><span class="p">)</span> <span class="n">finder1</span> <span class="o">=</span> <span class="n">Container</span><span class="o">::</span><span class="n">find</span><span class="p">;</span>
<span class="k">static</span> <span class="k">decltype</span><span class="p">(</span><span class="k">auto</span><span class="p">)</span> <span class="n">finder2</span> <span class="o">=</span> <span class="n">Container</span><span class="o">::</span><span class="n">find</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<h3>Generic lambdas</h3>
<p>Another place where <code>auto</code> has insinuated itself is in the definitions of lambda
parameters. Defining lambda parameters with an <code>auto</code> type declaration is the loose
equivalent of creating a template function. The lambda will be instantiated in a specific
embodiment based on the deduced types of the arguments.</p>
<p>This can be convenient for creating lambdas that can be reused in different contexts. In the
simple example below, I’ve created a lambda used as a predicate in a standard library function.
In the C++11 world, I would have needed to explicitly instantiate one lambda for adding integers,
and a second for adding strings.</p>
<p>With the addition of generic lambdas, I can define a single lambda with generic parameters.
Although the syntax doesn’t include the keyword <code>template</code>, this is still clearly a
further extension of C++ generic programming:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <iostream>
#include <vector>
#include <string>
#include <numeric>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">ivec</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span> <span class="n">svec</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"red"</span><span class="p">,</span>
<span class="s">"green"</span><span class="p">,</span>
<span class="s">"blue"</span> <span class="p">};</span>
<span class="k">auto</span> <span class="n">adder</span> <span class="o">=</span> <span class="p">[](</span><span class="k">auto</span> <span class="n">op1</span><span class="p">,</span> <span class="k">auto</span> <span class="n">op2</span><span class="p">){</span> <span class="k">return</span> <span class="n">op1</span> <span class="o">+</span> <span class="n">op2</span><span class="p">;</span> <span class="p">};</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"int result : "</span>
<span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">accumulate</span><span class="p">(</span><span class="n">ivec</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span>
<span class="n">ivec</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span>
<span class="mi">0</span><span class="p">,</span>
<span class="n">adder</span> <span class="p">)</span>
<span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"string result : "</span>
<span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">accumulate</span><span class="p">(</span><span class="n">svec</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span>
<span class="n">svec</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="s">""</span><span class="p">),</span>
<span class="n">adder</span> <span class="p">)</span>
<span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Which produces the following output:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="n">result</span> <span class="o">:</span> <span class="mi">10</span>
<span class="n">string</span> <span class="n">result</span> <span class="o">:</span> <span class="n">redgreenblue</span></code></pre></figure>
<p>Even if you are instantiating anonymous inline lambdas, the use of generic parameters is still
useful for the reasons discussed earlier in this article. When your data structures change, or
functions in your APIs get signature modifications, generic lambdas will adjust with recompilation
instead of requiring rewrites:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"string result : "</span>
<span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">accumulate</span><span class="p">(</span><span class="n">svec</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span>
<span class="n">svec</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="s">""</span><span class="p">),</span>
<span class="p">[](</span><span class="k">auto</span> <span class="n">op1</span><span class="p">,</span><span class="k">auto</span> <span class="n">op2</span><span class="p">){</span> <span class="k">return</span> <span class="n">op1</span><span class="o">+</span><span class="n">op2</span><span class="p">;</span> <span class="p">}</span> <span class="p">)</span>
<span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span></code></pre></figure>
<h3>Initialized lambda captures</h3>
<p>In C++11 we had to start adjusting to the notion of a <i>lambda capture</i> specification. That
declaration guides the compiler during the creation of the <i>closure</i>: an instance of the
function defined by the lambda, along with bindings to variables defined outside the lambda’s scope.</p>
<p>In the earlier example on deduced return types, I had a lambda definition that captured a single
variable <code>name</code>, used as the source of a search string in a predicate:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="k">auto</span> <span class="n">match_name</span> <span class="o">=</span> <span class="p">[</span><span class="o">&</span><span class="n">name</span><span class="p">](</span><span class="k">const</span> <span class="n">record</span><span class="o">&</span> <span class="n">r</span><span class="p">)</span> <span class="o">-></span> <span class="kt">bool</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">r</span><span class="p">.</span><span class="n">name</span> <span class="o">==</span> <span class="n">name</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">auto</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">find_if</span><span class="p">(</span><span class="n">people</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">people</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span> <span class="n">match_name</span> <span class="p">);</span></code></pre></figure>
<p>This particular capture gives the lambda access to the variable by reference. Captures can also be
performed by value, and in both cases, the use of the variable behaves in a way that fits with C++
intuition. Capture by value means the lambda operates on a local copy of a variable, capture by
reference means the lambda operates on the actual instance of the variable from the outer scope.</p>
<p>All this is fine, but it comes with some limitations. I think the one that the committee felt it
needed to address was the inability to initialize captured variables using move-only semantics.</p>
<p>What does this mean? If we expect that a lambda is going to be a <i>sink</i> for a parameter, we
would like to capture the outer variable using move semantics. As an example, consider how you
would get a lambda to sink a <code>unique_ptr</code>, which is a move-only object. A first attempt
to capture by value fails:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">p</span><span class="p">(</span><span class="k">new</span> <span class="kt">int</span><span class="p">);</span>
<span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="mi">11</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="p">]()</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"inside: "</span> <span class="o"><<</span> <span class="o">*</span><span class="n">p</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;};</span></code></pre></figure>
<p>This generates a compiler error because <code>unique_ptr</code> does not generate a copy
constructor - it specifically wants to ban making copies.</p>
<p>Changing this so that <code>p</code> is captured by reference compiles fine, but it doesn’t have
the desired effect of sinking the value by moving the value into the local copy. Eventually you
could accomplish this by creating a local variable and calling <code>std::move()</code> on your
captured reference, but this is a bit inefficient.</p>
<p>The fix for this is a modification of the capture clause syntax. Now instead of just declaring a
capture variable, you can do an initialization. The simple case that is used as an example in the
standard looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="p">[</span><span class="o">&</span><span class="n">r</span> <span class="o">=</span> <span class="n">x</span><span class="p">,</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">]()</span><span class="o">-></span><span class="kt">int</span> <span class="p">{...}</span></code></pre></figure>
<p>This captures a copy of x and increments the value simultaneously. This example is easy to
understand, but I’m not sure it captures the value of this new syntax for sinking move-only
variables. A use case that takes advantage of this shown here:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <memory>
#include <iostream>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">p</span><span class="p">(</span><span class="k">new</span> <span class="kt">int</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="mi">11</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">y</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="o">=</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">p</span><span class="p">)]()</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"inside: "</span> <span class="o"><<</span> <span class="o">*</span><span class="n">p</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;};</span>
<span class="n">y</span><span class="p">();</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"outside: "</span> <span class="o"><<</span> <span class="o">*</span><span class="n">p</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>In this case, the captured value <code>p</code> is initialized using move semantics, effectively
sinking the pointer without the need to declare a local variable:</p>
<pre>
inside: 11
Segmentation fault (core dumped)
</pre>
<p>That annoying result is what you expect - the code attempts to dereference <code>p</code> after it
was captured and moved into the lambda.</p>
<h3>The [[deprecated]] attribute</h3>
<p>The first time I saw the use of the deprecated attribute in Java, I admit to a bit of language
envy. Code rot is a huge problem for most programmers. (Ever been praised for deleting code? Me
neither.) This new attribute provides a systematic way to attack it.</p>
<p>Its use is nice and simple - just place the <code>[[deprecated]]</code> tag in front of a
declaration - which can be a class, variable, function or a few other things. The result looks
like this and looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span>
<span class="err">[[</span><span class="nc">deprecated</span><span class="p">]]</span> <span class="n">flaky</span> <span class="p">{</span>
<span class="p">};</span></code></pre></figure>
<p>When your program uses a deprecated entity, the compiler’s reaction is left up to the implementer.
Clearly most people are going to want to see some sort of warning, and most likely be able to turn
that warning off at will. As an example, clang 3.4 gave this warning when instantiating a
deprecated class:</p>
<pre>
dep.cpp:14:3: warning: 'flaky' is deprecated [-Wdeprecated-declarations]
flaky f;
^
dep.cpp:3:1: note: 'flaky' declared here
flaky {
^
</pre>
<p>Note that the syntax of C++ <i>attribute-tokens</i> might seem a bit unfamiliar. The list of
attributes, including <code>[[deprecated]]</code>, comes after keywords like <code>class</code> or
<code>enum</code>, and before the entity name.</p>
<p>This tag has an alternate form that includes a message parameter. Again, it is up to the
implementer to decide what to do with this message. clang 3.4 apparently ignores the message.
The output from this fragment:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span>
<span class="err">[[</span><span class="nc">deprecated</span><span class="p">]]</span> <span class="n">flaky</span> <span class="p">{</span>
<span class="p">};</span>
<span class="p">[[</span><span class="n">deprecated</span><span class="p">(</span><span class="s">"Consider using something other than cranky"</span><span class="p">)]]</span>
<span class="kt">int</span> <span class="n">cranky</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">flaky</span> <span class="n">f</span><span class="p">;</span>
<span class="k">return</span> <span class="n">cranky</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>does not contain the error message:</p>
<pre>
dep.cpp:14:10: warning: 'cranky' is deprecated [-Wdeprecated-declarations]
return cranky();
^
dep.cpp:6:5: note: 'cranky' declared here
int cranky()
^
</pre>
<h3>Binary literals and digit separators</h3>
<p>These two new features aren’t earth-shaking, but they do represent nice syntactic improvements.
Small changes like these give us some incremental improvements in the language that improve
readability and hence reduce bug counts.</p>
<p>C++ programmers can now create binary literals, adding to the existing canon of decimal, hex, and
the rarely used octal radices. Binary literals start with the prefix <code>0b</code> and are
followed by binary digits.</p>
<p>In US and UK, we are used to using commas as digit separators in written numbers, as in:
$1,000,000. These digit separators are there purely for the convenience of readers, providing
syntactic cues that make it easier for our brains to process long strings of numbers.</p>
<p>The committee added digit separators to C++ for exactly the same reasons. They won’t affect the
evaluation of a number, they are simply present to make it easier to read and write numbers
through <a href="https://en.wikipedia.org/wiki/Chunking_(psychology)" target="_blank">chunking</a>.</p>
<p>What character to use for a digit separator? Virtually every punctuation character already has an
idiosyncratic use in the language, so there are no obvious choices. The final election was to use
the single quote character, making the million dollar value render in C++ as:
<code>1'000'000.00</code>. Remember that the separators don’t have any effect on the evaluation of
the constant, so this value would be identical to <code>1'0'00'0'00.00</code>.</p>
<p>An example combining the use of both new features:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <iostream>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">val</span> <span class="o">=</span> <span class="mi">0</span><span class="n">b11110000</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Output mask: "</span>
<span class="o"><<</span> <span class="mi">0</span><span class="n">b1000</span><span class="err">'</span><span class="mo">0001'1000'0000</span>
<span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Proposed salary: $"</span>
<span class="o"><<</span> <span class="mf">300'000.00</span>
<span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This program gives the unsurprising output:</p>
<pre>
Output mask: 33152
Proposed salary: $300000
</pre>
<h3>The remainder</h3>
<p>Some additional features in the C++14 specification don’t require quite as much exposition.</p>
<p>Variable templates are an extension of templates to variables. The example used everywhere is an
implementation of variable <code>pi<T></code>. When implemented as a double, the variable will
return 3.14, when implemented as an int, it might return 3, and “3.14” or perhaps “pi” as an
<code>std::string</code>. This would have been a great feature to have when <code><limits></code>
was being written.</p>
<p>The syntax and semantics of variable templates are nearly identical to those for class templates -
you should have no trouble using them without any special study.</p>
<p>The restrictions on <code>constexpr</code> functions have been relaxed, allowing, for example,
multiple returns, internal <code>case</code> and <code>if</code> statements, loops, and more.
This expands the scope of things that are done at compile time, a trend that really took wing
when templates were introduced.</p>
<p>Additional minor features include sized deallocations and some syntax tidying.</p>
<h2>What next?</h2>
<p>The C++ committee clearly feels pressure to keep the language current through improvements, and is
already working on at least one more standard in this decade, C++17.</p>
<p>Possibly more interesting is the creation of a number of spin-off groups that can create
<a href="https://isocpp.org/std/iso-iec-jtc1-procedures" target="_blank">technical specifications</a>,
documents that won’t rise to the level of a standard but will be published and endorsed by the ISO
committee. Presumably these can be issued at a more rapid clip. The eight areas
<a href="https://isocpp.org/std/status" target="_blank">currently being worked</a> include:</p>
<ul>
<li />File system
<li />Concurrency
<li />Parallelism
<li />Networking
<li />Concepts (the AI of C++ - always one round of specification away)
</ul>
<p>Success of these technical specifications will have to be judged by adoption and use. If we find
that all the implementers line up behind them, then this new track for standardization will be a
success.</p>
<p>C/C++ has held up well over the years. Modern C++, which we might mark as starting with C++11, has
taken dramatic strides in making the language easier to use and safer without making concessions
in the areas of performance. For certain types of work it is hard to think of any reasonable
alternative to C or C++. The C++ 14 standard doesn’t make any jumps as large as that in the C++11
release, but it keeps the language on a good path. If the committee can keep its current level of
productivity for the rest of the decade, C++ should continue to be the language of choice when
performance is the guiding force.</p>Mark NelsonVoting on the C++14 standard was completed in August, and all that remains before we can say it is officially complete is publication by the ISO. In this article I will visit the high points of the new standard, demonstrating how the upcoming changes will affect the way you program, particularly when using the idioms and paradigms of Modern C++.C++ Generic Programming Meest OOP - std::is_base_of2014-07-02T20:30:34+00:002018-08-02T20:00:00+00:00https://marknelson.us/posts/2014/07/02/c-generic-programming-meets-oop-stdis_base_of<p>Alexander Stepanov’s Standard Template Library provided a huge push towards making template
programming an important part of C++, and helped to insure that it was included as part of the
first standard in 1998. But adoption of templates by most programmers has been more of an
incremental process, as opposed to a revolutionary one - many of us have literally decades
of <i>Object Oriented == The One True Path</i> to unlearn. And to top things off, templates needed
some refinement over the years before they could do many of the things we expected of them.</p>
<p>In this article, I’ll show you how generic programming with templates makes a noticeable
improvement in a pedestrian but important task I deal with constantly in my data compression work.
Then I’ll show you the annoying but understandable problems with this approach circa 1998, and
finally, how to resolve them with features added in TR1 and C++11.</p>
<h3>Bit-oriented I/O</h3>
<p>Most data compression algorithms read their input data using well-supported byte-oriented I/O.
Text files are read a byte at a time, and most image files are blocked into data structures that
align to byte boundaries.</p>
<p>But the output of compressors tends to be in odd sizes. For example, venerable LZW compression
output usually starts with nine-bit wide codes which grow as time goes on, through ten, eleven,
and twelve bits, up to some arbitrary size.</p>
<p>The standard C++ libraries don’t have I/O APIs that support these odd sizes, so any compressor
implementation has to build these I/O routines into their program.</p>
<p>And of course, the inverse task, decompression, needs to be able to <i>read</i> data in
non-standard sizes in order to write uncompressed data. This is effectively the same problem,
just turned on its head.</p>
<p>To illustrate how this works, I’ll use the example of arithmetic compression in this article.
Canonical CACM89 arithmetic compression outputs one bit at a time. So whether you are
writing to files, sockets, or shared memory, you are going to need to have some kind of shim
in your library that converts those single bits into bytes suitable for use with standard
libraries.</p>
<h3>The Solution Before Templates</h3>
<p>If you are writing a dedicated compressor that is purpose-built for some application, you
would probably just embed some dedicated code in your input and output routines. This article is
less important for a one-off solution like that.</p>
<p>Things get more complicated if you are a library writer. If you want to provide an arithmetic
compression capability to end users, you don’t have any idea in advance where their I/O is going.
So the classical OOP solution to this is to define a base class with virtual methods that the
compressor needs, then require library users to implement derived classes that provide all missing
functions and obey certain rules.</p>
<p>In this case, my arithmetic compressor will have dedicated routines that do the bit-fiddling,
and then rely on a shim class to provide byte-oriented input and output. This means the library
can read or write from a disparate variety of sources. My pre-template solution to this is
illustrated with a partial code segment from <code>bitio_00.cpp</code>. This uses classic OOP to
create input and output classes, which are then called from inside the compressor.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">compressor</span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">compressor</span><span class="p">(</span><span class="n">input_bytes</span> <span class="o">&</span><span class="n">input</span><span class="p">,</span> <span class="n">output_bytes</span> <span class="o">&</span><span class="n">output</span> <span class="p">)</span> <span class="o">:</span>
<span class="n">m_input</span><span class="p">(</span><span class="n">input</span><span class="p">),</span>
<span class="n">m_output</span><span class="p">(</span><span class="n">output</span><span class="p">),</span>
<span class="n">m_NextByte</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span>
<span class="n">m_Mask</span><span class="p">(</span><span class="mh">0x80</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="o">~</span><span class="n">compressor</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">m_Mask</span> <span class="o">!=</span> <span class="mh">0x80</span> <span class="p">)</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">putByte</span><span class="p">(</span><span class="n">m_NextByte</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="k">operator</span><span class="p">()()</span>
<span class="p">{</span>
<span class="c1">//compression takes place here</span>
<span class="c1">// e.g. putBit( 0 ); putBit( 1 );</span>
<span class="p">}</span>
<span class="k">protected</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">putBit</span><span class="p">(</span> <span class="kt">bool</span> <span class="n">val</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">val</span> <span class="p">)</span>
<span class="n">m_NextByte</span> <span class="o">|=</span> <span class="n">m_Mask</span><span class="p">;</span>
<span class="n">m_Mask</span> <span class="o">>>=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">m_Mask</span><span class="p">)</span> <span class="p">{</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">putByte</span><span class="p">(</span><span class="n">m_NextByte</span><span class="p">);</span>
<span class="n">m_Mask</span> <span class="o">=</span> <span class="mh">0x80</span><span class="p">;</span>
<span class="n">m_NextByte</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span> <span class="o">:</span>
<span class="n">output_bytes</span> <span class="o">&</span><span class="n">m_output</span><span class="p">;</span>
<span class="n">input_bytes</span> <span class="o">&</span><span class="n">m_input</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">m_NextByte</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">m_Mask</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>If I were going to perform compression on a pair of <code>iostream</code> objects, I would need
simple derived classes that implemented the necessary shim code for the <code>putByte()</code> and
<code>getByte()</code> functions. A simple version of the input class used in
<code>bitio_00.cpp</code> is shown here:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">input_bytes</span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="k">virtual</span> <span class="kt">int</span> <span class="n">getByte</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">input_bytes_istream</span> <span class="o">:</span> <span class="k">public</span> <span class="n">input_bytes</span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">input_bytes_istream</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">istream</span> <span class="o">&</span><span class="n">stream</span><span class="p">)</span>
<span class="o">:</span> <span class="n">m_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">getByte</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">m_stream</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">private</span> <span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">istream</span> <span class="o">&</span><span class="n">m_stream</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>This works, and at least for the library writer it is fairly satisfactory.</p>
<p>As a library user it is not really optimal. I’d like to be able to use a library without having to
create shim classes. Of course the library writer is free to implement common shim classes - which
can then be used as input for more customized approaches.</p>
<p>Withholding any other judgment, this is an object oriented solution that is fairly simple to
understand and implement, and there are plenty of libraries out there that use this sort of
implementation. You probably find it somewhat familiar and can imagine how you might deal with
it as a user.</p>
<h3>So Where Are The Problems?</h3>
<p>Although the OOP solution to this problem works, there are a couple of problems.</p>
<p>First, as library writers start identifying additional things they need in their I/O classes,
the abstract base classes sometimes get larded with additional methods, making it more and more
complicated to implement specialized versions. While not a problem in this specific example,
it definitely is a problem in the real world.</p>
<p>For example, the arithmetic compressor writer might decide that block I/O offered some performance
improvements, leading him or her to add methods for <code>putBytes()</code> and
<code>getBytes()</code>. This kind of creeping elegance leads to complex base classes that annoy
users.</p>
<p>However, this is more a stylistic problem, and not necessarily tied directly to OOP, although it
does seem to be a natural outgrowth of that type of programming.</p>
<p>A more important problem, and one that is really a big deal for data compression, is the cost of
using virtual base classes to implement I/O. Functions like <code>putBit()</code> are called
repeatedly in tight loops. The fact that they have to invoke virtual functions is annoying - these
are hard for the C++ compiler to optimize at compile time, and so each call is an expensive
(usually one level of pointer indirection) subroutine call.</p>
<p>What would be much better, and what the template-based approach allows for, is to define the input
and output routines as parameterized types using templates. The compiler can then generate the
code that calls those functions inline with the compression code - avoiding those expensive
subroutine calls and giving the compiler leeway to optimize at will. In some circumstances, this
can result in a substantial improvement in runtimes.</p>
<h3>The 1998 Template-based Solution</h3>
<p>So how would my code look if I ditched OOP and used a generic solution? My sample code in
<code>bitio_01.cpp</code> has a compressor class that is declared like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">INPUT</span><span class="p">,</span> <span class="k">typename</span> <span class="n">OUTPUT</span><span class="o">></span>
<span class="k">class</span> <span class="nc">compressor</span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">compressor</span><span class="p">(</span><span class="n">INPUT</span> <span class="o">&</span><span class="n">input</span><span class="p">,</span> <span class="n">OUTPUT</span> <span class="o">&</span><span class="n">output</span> <span class="p">)</span> <span class="o">:</span>
<span class="n">m_input</span><span class="p">(</span><span class="n">input</span><span class="p">),</span>
<span class="n">m_output</span><span class="p">(</span><span class="n">output</span><span class="p">),</span>
<span class="n">m_NextByte</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span>
<span class="n">m_Mask</span><span class="p">(</span><span class="mh">0x80</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="k">protected</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">putBit</span><span class="p">(</span> <span class="kt">bool</span> <span class="n">val</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">val</span> <span class="p">)</span>
<span class="n">m_NextByte</span> <span class="o">|=</span> <span class="n">m_Mask</span><span class="p">;</span>
<span class="n">m_Mask</span> <span class="o">>>=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">m_Mask</span><span class="p">)</span> <span class="p">{</span>
<span class="n">m_output</span><span class="p">.</span><span class="n">putByte</span><span class="p">(</span><span class="n">m_NextByte</span><span class="p">);</span>
<span class="n">m_Mask</span> <span class="o">=</span> <span class="mh">0x80</span><span class="p">;</span>
<span class="n">m_NextByte</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span></code></pre></figure>
<p>With this kind of template programming, I can construct my compressor with an input object of <i>any</i> class that has a <code>getByte()</code> method, and an output object of <i>any</i> class that has a <code>putByte()</code> method.</p>
<p>As a library writer, I am going to go ahead and provide a convenience class that takes an
<code>std::ostream</code> object and implements the needed function. My template class,
<code>output_bytes<T></code>, is defined for <code>std::ostream</code> only. I include a
default implementation that will cause a compiler error if you attempt to construct it with some
other type of object - this is a 1998-circa attempt to perform compile-time assertions.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">output_bytes</span>
<span class="p">{</span>
<span class="k">private</span> <span class="o">:</span>
<span class="c1">//</span>
<span class="c1">// If you try to instantiate an output_bytes<T></span>
<span class="c1">// object for a type that doesn't have a specialization,</span>
<span class="c1">// you will get an error indicating that you are </span>
<span class="c1">// trying to use this private constructor. </span>
<span class="c1">//</span>
<span class="n">output_bytes</span><span class="p">(...);</span>
<span class="k">public</span> <span class="o">:</span>
<span class="kt">void</span> <span class="n">putByte</span><span class="p">(</span><span class="kt">char</span><span class="p">);</span>
<span class="p">};</span>
<span class="c1">//</span>
<span class="c1">// Specialization of output_bytes for class ostream</span>
<span class="c1">//</span>
<span class="k">template</span><span class="o"><></span>
<span class="k">class</span> <span class="nc">output_bytes</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ostream</span><span class="o">></span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">output_bytes</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ostream</span> <span class="o">&</span><span class="n">stream</span><span class="p">)</span> <span class="o">:</span> <span class="n">m_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">){}</span>
<span class="kt">void</span> <span class="n">putByte</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_stream</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span> <span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">ostream</span> <span class="o">&</span><span class="n">m_stream</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>With that in place, I can now pass in <code>ostream</code> objects to my compressor without the user having to implement a shim class. I use a helper function to deal with some of the mechanics needed to actually construct the compressor and run it:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//</span>
<span class="c1">// This convenience function takes care of</span>
<span class="c1">// constructing the compressor and the</span>
<span class="c1">// input and output objects, then calling</span>
<span class="c1">// the compressor.</span>
<span class="c1">//</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">INPUT</span><span class="p">,</span> <span class="k">typename</span> <span class="n">OUTPUT</span><span class="o">></span>
<span class="kt">int</span> <span class="n">compress</span><span class="p">(</span><span class="n">INPUT</span> <span class="o">&</span><span class="n">source</span><span class="p">,</span> <span class="n">OUTPUT</span> <span class="o">&</span><span class="n">target</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">input_bytes</span><span class="o"><</span><span class="n">INPUT</span><span class="o">></span> <span class="n">in</span><span class="p">(</span><span class="n">source</span><span class="p">);</span>
<span class="n">output_bytes</span><span class="o"><</span><span class="n">OUTPUT</span><span class="o">></span> <span class="n">out</span><span class="p">(</span><span class="n">target</span><span class="p">);</span>
<span class="n">compressor</span><span class="o"><</span><span class="n">input_bytes</span><span class="o"><</span><span class="n">INPUT</span><span class="o">></span><span class="p">,</span><span class="n">output_bytes</span><span class="o"><</span><span class="n">OUTPUT</span><span class="o">></span> <span class="o">></span> <span class="n">c</span><span class="p">(</span><span class="n">in</span><span class="p">,</span><span class="n">out</span><span class="p">);</span>
<span class="k">return</span> <span class="n">c</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">ostream</span> <span class="o">*</span><span class="n">pOut</span> <span class="o">=</span> <span class="k">new</span> <span class="n">std</span><span class="o">::</span><span class="n">ofstream</span><span class="p">(</span><span class="s">"output0.bin"</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">istream</span> <span class="o">*</span><span class="n">pIn</span> <span class="o">=</span> <span class="k">new</span> <span class="n">std</span><span class="o">::</span><span class="n">ifstream</span><span class="p">(</span><span class="s">"input1.txt"</span><span class="p">);</span>
<span class="n">compress</span><span class="p">(</span><span class="o">*</span><span class="n">pIn</span><span class="p">,</span> <span class="o">*</span><span class="n">pOut</span><span class="p">);</span>
<span class="k">delete</span> <span class="n">pIn</span><span class="p">;</span>
<span class="k">delete</span> <span class="n">pOut</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h3>Templates Lack OOP Semantics</h3>
<p>So in the code above, I’ve created shim classes that should work with anything derived from
<code>std::iostream</code>. I should be able to read and write to files, the console, memory,
etc.</p>
<p>This is true, but you will note right away that this template-based solution won’t quite match up
with your expectations. If you change the code shown above to look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">ofstream</span> <span class="n">out</span><span class="p">(</span><span class="s">"output0.bin"</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">istream</span> <span class="o">*</span><span class="n">pIn</span> <span class="o">=</span> <span class="k">new</span> <span class="n">std</span><span class="o">::</span><span class="n">ifstream</span><span class="p">(</span><span class="s">"input1.txt"</span><span class="p">);</span>
<span class="n">compress</span><span class="p">(</span><span class="o">*</span><span class="n">pIn</span><span class="p">,</span> <span class="n">out</span><span class="p">);</span>
<span class="k">delete</span> <span class="n">pIn</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You’ll get the error showing that you have tried to instantiate the generalized class, not the
<code>std::ostream</code> specialization:</p>
<pre>
1>------ Build started: Project: traits, Configuration: Debug Win32 ------
1>Compiling...
1>bitio_01.cpp
1>bitio_01.cpp(121) : error C2248: 'output_bytes<T>::output_bytes'
1> cannot access private member declared in class 'output_bytes<T>'
1> with
1> [
1> T=std::ofstream
1> ]
1> bitio_01.cpp(15) : see declaration of 'output_bytes<T>::output_bytes'
1> with
1> [
1> T=std::ofstream
1> ]
</pre>
<p>So what’s going on here? My template class is specialized for <code>std::ostream</code>, and
<code>std::ofstream</code> is an <code>std::ostream</code> object, is it not?</p>
<p>No, it isn’t. As OOP programmers we casually say that an <code>std::ofstream</code> object
<i>is an</i> <code>std::ostream</code> object, but this is syntactic shorthand for a deeper OOP
principle. At the surface level, these are two separate classes, quite distinct from one
another.</p>
<p>And therein lies the problem - template instantiation is simple-minded - when you create a
specialized version of class <code>output_bytes</code> for <code>std::ostream</code>, the compiler
will construct it for that one class, and that one class only.</p>
<p>So what this means is that my convenience class <code>output_bytes<std::ostream></code> is
not quite as convenient as I would like it to be. Anyone using it for class derived from
<code>ostream</code> will have to upcast their arguments:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">ofstream</span> <span class="n">output1</span><span class="p">(</span><span class="s">"output0.bin"</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">istream</span> <span class="o">&</span><span class="n">input1</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">ifstream</span><span class="p">(</span><span class="s">"input1.txt"</span><span class="p">);</span>
<span class="n">compress</span><span class="p">(</span><span class="n">input1</span><span class="p">,</span> <span class="k">dynamic_cast</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ostream</span><span class="o">&></span><span class="p">(</span><span class="n">output1</span><span class="p">));</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h3>The Enable If Idiom</h3>
<p>This should be a problem I can fix. What I want to do is modify my template definition of
<code>output_bytes</code> so that it works with <code>std::ostream</code> <i>or any class derived from it</i>.</p>
<p>This can be done with the help of two relatively new classes. <code>std::enable_if</code> is a
template class that was added to the library with C++11, and <code>std::is_base_of</code> is a
struct added with TR1 in 2007. I’ll show the new definition below, then go on to try to analyze
what is happening:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="p">,</span><span class="k">typename</span> <span class="n">Enable</span> <span class="o">=</span> <span class="kt">void</span><span class="o">></span>
<span class="k">class</span> <span class="nc">output_bytes</span>
<span class="p">{</span>
<span class="k">private</span> <span class="o">:</span>
<span class="n">output_bytes</span><span class="p">(...);</span>
<span class="k">public</span> <span class="o">:</span>
<span class="kt">void</span> <span class="n">putByte</span><span class="p">(</span><span class="kt">char</span><span class="p">);</span>
<span class="p">};</span>
<span class="c1">//</span>
<span class="c1">// Specialization of output_bytes for class ostream</span>
<span class="c1">// and classes derived from ostream</span>
<span class="c1">//</span>
<span class="k">using</span> <span class="n">std</span><span class="o">::</span><span class="n">is_base_of</span><span class="p">;</span>
<span class="k">using</span> <span class="n">std</span><span class="o">::</span><span class="n">ostream</span><span class="p">;</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">output_bytes</span><span class="o"><</span><span class="n">T</span><span class="p">,</span><span class="k">typename</span> <span class="n">enable_if</span><span class="o"><</span><span class="n">is_base_of</span><span class="o"><</span><span class="n">ostream</span><span class="p">,</span> <span class="n">T</span><span class="o">>::</span><span class="n">value</span><span class="o">>::</span><span class="n">type</span><span class="o">></span>
<span class="p">{</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">output_bytes</span><span class="p">(</span><span class="n">T</span> <span class="o">&</span><span class="n">stream</span><span class="p">)</span> <span class="o">:</span> <span class="n">m_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">){}</span>
<span class="kt">void</span> <span class="n">putByte</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_stream</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span> <span class="o">:</span>
<span class="n">T</span> <span class="o">&</span><span class="n">m_stream</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Comparing this code with what was seen in the 1998 template version, you’ll see that the only real
changes are in the part of the class definition in which template parameters are declared.</p>
<p>First, the base class now has two parameters - the first being a type that will nominally be a
class derived from <code>std::ostream</code>, with the second being an arbitrary type parameter
called <code>Enable</code>, which defaults to type <code>void</code>. When instantiating objects
of class <code>output_bytes</code>, I’m still going to just use the single type specifier,
as shown below.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">INPUT</span><span class="p">,</span> <span class="k">typename</span> <span class="n">OUTPUT</span><span class="o">></span>
<span class="kt">int</span> <span class="n">compress</span><span class="p">(</span><span class="n">INPUT</span> <span class="o">&</span><span class="n">source</span><span class="p">,</span> <span class="n">OUTPUT</span> <span class="o">&</span><span class="n">target</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">input_bytes</span><span class="o"><</span><span class="n">INPUT</span><span class="o">></span> <span class="n">in</span><span class="p">(</span><span class="n">source</span><span class="p">);</span>
<span class="n">output_bytes</span><span class="o"><</span><span class="n">OUTPUT</span><span class="o">></span> <span class="n">out</span><span class="p">(</span><span class="n">target</span><span class="p">);</span>
<span class="n">compressor</span><span class="o"><</span><span class="n">input_bytes</span><span class="o"><</span><span class="n">INPUT</span><span class="o">></span><span class="p">,</span><span class="n">output_bytes</span><span class="o"><</span><span class="n">OUTPUT</span><span class="o">></span> <span class="o">></span> <span class="n">c</span><span class="p">(</span><span class="n">in</span><span class="p">,</span><span class="n">out</span><span class="p">);</span>
<span class="k">return</span> <span class="n">c</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>So where does the second type parameter come into play?</p>
<p>In the specialization of <code>output_bytes</code> that I have created for classes derived from
<code>std::ostream</code>, you’ll see that I am using an expression for this second parameter:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">enable_if</span><span class="o"><</span><span class="n">is_base_of</span><span class="o"><</span><span class="n">ostream</span><span class="p">,</span> <span class="n">T</span><span class="o">>::</span><span class="n">value</span><span class="o">>::</span><span class="n">type</span></code></pre></figure>
<p>This expression evaluates to a type of void if T is type <code>std::ostream</code> or a class
derived from it. If T does not match up this way, then the expression is not defined.</p>
<h3>SFINAE</h3>
<p>So in some cases a template parameter is defined, and in other cases it is not defined. Let’s see
how this works. The inner part of the expression is:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">is_base_of</span><span class="o"><</span><span class="n">ostream</span><span class="p">,</span> <span class="n">T</span><span class="o">>::</span><span class="n">value</span></code></pre></figure>
<p>If you look up
<a href="https://en.cppreference.com/w/cpp/types/is_base_of" target="_blank">std::is_base_of</a>,
you’ll see that the instantiation of that struct has a constant static member named
<code>value</code>, which is true if T is derived from <code>std::ostream</code> (in this case),
and false if it is not. Since this is a static constant member of the struct, it is known at
compile time and can be used as a template argument.</p>
<p>This means that we are instantiating <code>std::enable_if</code> with a constant that is either
true or false. This is where things get interesting. We are passing
<code>std::enable_if<T>::value</code> as the second argument to
<code>output_bytes<T,Enable></code>. Looking at the definition for
<a href="https://en.cppreference.com/w/cpp/types/enable_if" target="_blank">enable_if</a>,
you’ll see that if the argument passed to it is true, it has a typedef for <code>value</code>.
If the template argument is false, the typedef does not exist.</p>
<p>So if T is the desired <code>std::ostream</code> derived class, we are passing in two type
arguments to the template definition. But if T is some other unwanted class, the second argument
doesn’t exist. What happens then?</p>
<p>It turns out that C++ has a very specific rule in place for this scenario:
<a href="https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error" target="_blank">Substitution Failure Is Not an Error</a>,
or SFINAE. What this means is that in this case, the compiler finds that it can’t provide the
second argument to the class definition for <code>output_bytes</code>, so it simply skips trying
to instantiate it. Not an error.</p>
<p>This means that the slightly wonky definition does exactly what I want it to do - it instantiates
for all <code>std::ostream</code> classes, and doesn’t for others.</p>
<h3>Conclusion</h3>
<p>Changing a template specialization so that it applies to an entire class hierarchy instead of just
a single class can be done fairly easily using modern C++ tools. It involves some fairly benign
changes to a class declaration, and imposes no runtime cost.</p>
<p>As always, this type of template metaprogramming is somewhat difficult to get your head around if
you are a traditional C++ procedural programmer, as so many of us are. You have to get used to the
concept of passing types as parameters at compile time, and this is just completely new territory.</p>
<p>As C++ adds more support for traits-based programming in the post-C++11 future, it may be that
creating this type of class becomes simpler. But for now, it works pretty well without too much
pain, as long as you can get your compiler updated to TR1 or later.</p>
<p>Note that
<a href="https://www.boost.org/" target="_blank">boost</a>
provides libraries that implemented <code>enable_if</code> and <code>is_base_of</code>
long before TR1, so you should be able to implement this type of code with just about any C++
compiler in use today. A home-grown implementation of <code>enable_if</code> is trivial, and is
included in the sample code:</p>
<p><a href="https://marknelson.us/assets/2014-07-02-c-generic-programming-meets-oop-stdis_base_of/bitio.zip" target="_blank">bitio.zip</a></p>Mark NelsonAlexander Stepanov’s Standard Template Library provided a huge push towards making template programming an important part of C++, and helped to insure that it was included as part of the first standard in 1998. But adoption of templates by most programmers has been more of an incremental process, as opposed to a revolutionary one - many of us have literally decades of Object Oriented == The One True Path to unlearn. And to top things off, templates needed some refinement over the years before they could do many of the things we expected of them.Debugging Windows Services Startup Problems2014-04-20T20:30:00+00:002018-08-02T20:00:00+00:00https://marknelson.us/posts/2014/04/20/debugging-windows-services-startup-problems<p>In the days of XP, a Windows Service was more or less an ordinary executable running in the same
session as all other executables. Debugging it was fairly simple, although you did have to deal
with the complication that it was started by the Service Control Manager.</p>
<p>Today Windows Services run in a mysterious Session 0, which is difficult to work with. If your
Service is written in C++, you’ll find that it can be problematic to deal with bugs, particularly
crashes, that occur during Service startup.</p>
<p>In this article I’ll explain why that is, and show you a very simple set of techniques for dealing
with this problem. I hope that the relative simplicity of all this will reduce your frustration
level when debugging your Windows Services.</p>
<h3>The Problem Statement</h3>
<p>In the XP days, Windows Services ran in the same environment as normal executables. Services
handle huge amounts of the workload involved in keeping Windows humming along, and much of this
stuff can drastically change the way the O/S behaves - which means any surface area they expose
represents a real security issue.</p>
<p>When Windows Services are normal executables, they can be bombarded with Windows Messages, COM
requests, DDE, all the normal IPC stuff that Windows uses. Malformed instances of these messages can crash services
or get them to behave improperly. It’s generally just not a good way to do things - it would be
like keeping the keys to the family jewels on a hook in your entry hall. The capture below shows
just a small fraction of the services running on my Windows 7 laptop. If you are a black hat
hacker, you can’t help but drool a bit at what you see there:</p>
<center>
<img src="/assets/2014-04-20-debugging-windows-services-startup-problems/TaskManager.png" alt="This graphic shows a list of services running on a typical Windows system. The bottom of the dialog says there are 136 processes running, the visible part of the list shows 26 services currently running." title="" />
<br />
<b>A Sampling of Maybe 120 Services On an Windows 7 System</b>
</center>
<p>In the Vista era, all services were moved to a special Session 0. (See
<a href="https://blogs.technet.microsoft.com/askperf/2007/07/24/sessions-desktops-and-windows-stations/" target="_blank">Sessions, Desktops and Windows Stations</a>
for discussion of these terms.) Or to be more accurate, everything else was moved <i>out</i> of
Session 0. Basically, this drastically limited the ability of user mode programs to interface with
services - mostly for the good.</p>
<p>In general, executables running in Session 0 don’t communicate with your desktop session - no
windows messages, for example, which makes them much more secure. But it also makes it hard to
debug them. I’ll give some explanation below showing why this is, and how we get around it.</p>
<h3>Debugging a Service</h3>
<p>Unlike a normal application, I don’t start a service by entering its name on the command line, or
by calling
<a href="https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-createprocessa" target="_blank"><code>CreateProcess</code></a>.
Instead I have to rely on the Service Control Manager to start and stop the process by having my
process respond to some very specific commands.</p>
<p>This doesn’t fit very well into the normal debugging paradigm - we normally expect the debugger to
actually start the program in question. But with Windows Services, we must go to the
Services plugin of the Management Console and tell it to start the service - the app then starts
without much help from us.</p>
<p>To debug that app after it has started, you will need to invoke the <i>Debug|Attach to Process</i>
function, which brings up the dialog shown below:</p>
<center>
<img src="/assets/2014-04-20-debugging-windows-services-startup-problems/AttachToProcess.png" alt="This graphic shows the 'Attach to Process' dialog from VisualStudio. It lists running processes on the system, including services, and the selected process is called 'mrnService.exe', the sample service created for this article." title="" />
<br />
<b>The Attach to Process Dialog</b>
</center>
<p>By checking the <i>Show Processes From All Users</i> checkbox, I can see my service, select it, and
attach it to the debugger. I’m now free to set breakpoints, watchpoints, examine variables, and do
all the other things that I need to debug an app. Things are just the way I want to them to be.</p>
<h3>So What's the Problem?</h3>
<p>It seems like we have a pretty reasonable way to debug a service, right? So why is this article even being written?</p>
<p>As it happens, I work on an app that runs as a Windows Service, and this app spends a lot of time
at startup figuring out how it is configured. From time to time, things go wrong during that
startup and my state is incorrect. Even worse, there are times when that startup code crashes.</p>
<p>As a toy example, here is some code that I might execute in the startup of a service. It runs at
the start of <code>PreMessageLoop()</code>, a good place to do initialization of a service:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CRegKey</span> <span class="n">key</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">checks_per_second</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">LONG</span> <span class="n">err</span> <span class="o">=</span> <span class="n">key</span><span class="p">.</span><span class="n">Open</span><span class="p">(</span><span class="n">HKEY_LOCAL_MACHINE</span><span class="p">,</span><span class="s">L"SOFTWARE</span><span class="se">\\</span><span class="s">mrn</span><span class="se">\\</span><span class="s">mrnService"</span><span class="p">,</span> <span class="n">KEY_READ</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">err</span> <span class="o">==</span> <span class="n">ERROR_SUCCESS</span> <span class="p">)</span>
<span class="n">key</span><span class="p">.</span><span class="n">QueryDWORDValue</span><span class="p">(</span><span class="s">L"ChecksPerSecond"</span><span class="p">,</span><span class="n">checks_per_second</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">time_in_ms</span> <span class="o">=</span> <span class="mi">1000</span> <span class="o">/</span> <span class="n">checks_per_second</span><span class="p">;</span></code></pre></figure>
<p>It turns out that I don’t check for the proper opening of the registry key - and I had
inadvertently stored the key in <code>HKLM/SOFTWARE/mrn/mrnService</code> on my Windows 7 system. I
should have created it in <code>HKLM/SOFTWARE/WOw6432Node/mrn/mrnService</code>. As a result, the
key open failed, and <code>checks_per_scond</code> value was left at 0. (I don’t check for illegal
values of the key even if it is read properly - a second representative error.)</p>
<p>When I attempted to start this service, I’d get a dialog box from the Service Control Manager, something like this:</p>
<center>
<img src="/assets/2014-04-20-debugging-windows-services-startup-problems/ScmError.png" alt="This graphic shows a dialog from the Windows Service Control Manager. In this case it is telling me that it is unable to start mrnService, the test app discussed in this article. The error message is 'Error 1067: The process terminated unexpectedly.'" title="" />
<br />
<b>Error From Starting a Service</b>
</center>
<p>This would seem like a good point to attach to the service and start debugging, but you can forget
about it - the service is already crashed and gone.</p>
<h3>Where is JIT When I Need It?</h3>
<p>What I really need here is the normal popup that I see on a dev system when a crash occurs - the
one that asks if I would like to debug the troubled process. Why aren’t I seeing it?</p>
<p>If you are quick on the draw, Process Explorer actually shows you what is going on. In the screen
shot below, you can see that my service, <code>mrnService.exe</code> is caught by the Windows
Error Reporting tool, which normally brings up just that dialog:</p>
<center>
<img src="/assets/2014-04-20-debugging-windows-services-startup-problems/wer2.png" alt="This screen cap shows part of the Process Explorer dialog. You see that mrnService.exe has kicked off a copy of WerFault.exe, the Windows Error Reporting tool." title="" />
<br />
<b>Windows Error Reporting Tools</b>
</center>
<p>The problem is that this is all happening in Session 0, which does not have the ability to interact
with my desktop. So Windows Error Reporting pops up a dialog and quickly sees that there is nobody
home to click on it. It simply closes up shop and kills the errant process. I have no opportunity
to catch this in progress.</p>
<p>(This capture also highlights part of the difficulty in debugging services - the process has been started by <code>svchost.exe</code>, not Visual Studio. The lifecycle of a service requires this, like it or not.)</p>
<h3>A Reasonable Solution</h3>
<p>It’s not quite true that Session 0 has no opportunity to communicate with your desktop. Windows
has a Remote Desktop API that allows for just the type of communications we would like. In
particular, I can use the <code>WTSSendMessage</code> function to pop up a message on my desktop
when the service enters that crucial startup phase. The resulting message box gives me an
opportunity to attach to the service and start debugging before it has done anything of importance.</p>
<p>The only thing I need to send a message to my screen is the console session ID, and Microsoft was
kind enough to provide an API for that as well. The lines inserted into my app to support this
look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <Wtsapi32.h>
#pragma comment( lib, "Wtsapi32.lib" )
</span><span class="c1">//</span>
<span class="c1">// other parts of your program ...</span>
<span class="c1">//</span>
<span class="n">HRESULT</span> <span class="n">CmrnServiceModule</span><span class="o">::</span><span class="n">PreMessageLoop</span><span class="p">(</span><span class="kt">int</span> <span class="n">nShowCmd</span><span class="p">)</span>
<span class="p">{</span>
<span class="cp">#ifdef _DEBUG
</span> <span class="kt">wchar_t</span> <span class="n">title</span><span class="p">[]</span> <span class="o">=</span> <span class="s">L"mrnservice in startup - 60 seconds to take action"</span><span class="p">;</span>
<span class="kt">wchar_t</span> <span class="n">message</span><span class="p">[]</span> <span class="o">=</span> <span class="s">L"To debug, first attach to the process with Visual "</span>
<span class="s">L"Studio, then click OK. If you don't want to debug, "</span>
<span class="s">L"just click OK without attaching"</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">consoleSession</span> <span class="o">=</span> <span class="o">::</span><span class="n">WTSGetActiveConsoleSessionId</span><span class="p">();</span>
<span class="n">DWORD</span> <span class="n">response</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="n">ret</span> <span class="o">=</span> <span class="o">::</span><span class="n">WTSSendMessage</span><span class="p">(</span> <span class="n">WTS_CURRENT_SERVER_HANDLE</span><span class="p">,</span>
<span class="n">consoleSession</span><span class="p">,</span>
<span class="n">title</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">title</span><span class="p">),</span>
<span class="n">message</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">message</span><span class="p">),</span>
<span class="n">MB_OK</span><span class="p">,</span>
<span class="mi">60</span><span class="p">,</span>
<span class="o">&</span><span class="n">response</span><span class="p">,</span>
<span class="n">TRUE</span> <span class="p">);</span>
<span class="cp">#endif
</span><span class="c1">// my next line of real code</span></code></pre></figure>
<p>With this in place, when I am unsure about what is happening with the startup, my workflow works
like this:</p>
<ul>
<li />I build my project, and install the service with <code>mrnService.exe /service</code>.
<li />I set a breakpoint in my program at the the first line of code after
<code>WTSSendMessage</code> - the point where initialization commences. If I am drilling down
with more specificity I might set the breakpoint at a different point in the startup code.
<li />I start the service with <code>sc start mrnService</code>.
<li />I see a dialog popup as shown below.
<li />I switch to Visual Studio, and invoke <em>Debug|Attach to Process</em>.
<li />In the resulting dialog, I make sure that I have <em>Show processes from all users</em> checked.
<li />I navigate down to <code>mrnService.exe</code> as quickly as possible, then click the <em>Attach</em> button.
<li />Finally, I return to the dialog popped up by my service and click <em>OK</em>.
<li />If all is working properly, I will then immediately hit a breakpoint in my service, and I can
start walking through my code to see where things have gone awry.
</ul>
<center>
<img src="/assets/2014-04-20-debugging-windows-services-startup-problems/Dialog.png" alt="The dialog shown in this graphic is the one created in my sample service. It says that to debug, first attach to the process with Visual Studio, then click OK." title="" />
<br />
<b>My Service Startup Popup</b>
</center>
<p>This is simple and works more or less as expected. Compared to other techniques for debugging
service startup, I find it a winner.</p>
<h3>A Few Other Issues</h3>
<p>Microsoft really blew it when they decided to name this special class of processes
<em>Windows Services</em>. The word <em>Services</em> is so heavily overloaded in the world of
Windows that you could easily come up with a dozen different uses for it - starting with the
biggest conflict, COM Services/Servers. This makes it extraordinarily hard to use web searches to
find information on your problem.</p>
<p>*IX is way out ahead on this, with their equivalent processes having the unique name
<em>daemons</em>. The search space is much more constrained, allowing you to drill down a lot faster.</p>
<p>One other problem you might run into when debugging services is that the Service Control Manager
will run out of patience when trying to start your service. After a minute or two, you are going
to fail out. Buried in
<a href="https://support.microsoft.com/en-us/help/824344/how-to-debug-windows-services" target="_blank">MS KB824344</a>
you can find the instructions on modifying your registry settings to allow for long timeouts -
not something you want to do on production machines, but totally appropriate in a development world.</p>
<p>My demo 32-bit C++ Windows Service project for Visual Studio 2012 is
<a href="https://marknelson.us/assets/2014-04-20-debugging-windows-services-startup-problems/mrnService.zip">here</a>.
This is just a shell and doesn’t do much, but you can experiment with it to do sanity testing.</p>
<p>If you are using RDP to connect to the machine where you are debugging, the technique shown here
won’t quite work. It will pop up a dialog on the <em>console</em> of the machine you are
connected to - not the RDP window that you are using. I’ll show you how to get around this problem
in a future article.</p>
<p>None of this stuff matters if you are developing on XP. Of course, being on XP means you have
plenty of other problems to deal with, but Session 0 isolation is not one of them.</p>
<p>I also highly recommend not using the <code>mmc.exe</code> Services plugin to start and stop your
services during development - in fact it is best if you don’t even run <code>mmc.exe</code> while
you are doing service development. There are issues you will run into in which your services
become disabled but can’t be removed, sometimes even requiring a reboot. Open an escalated
<code>CMD.EXE</code> window in the Debug directory of your project, and get used to using these
commands over and over:</p>
<pre>
rem
rem initial registration of your service
rem
servicename.exe /service
rem
rem stop and start service
rem
sc start servicename
sc stop servicename
rem
rem Removal of service
rem
servicename.exe /unregserver
</pre>
<p>Of course, keep in mind that the name of your executable and the name of your service won’t
necessarily be the same. That’s up to you.</p>
<p>Good luck with your service development! Once you get past some of these tricky bits, it’s really
no different than debugging your normal apps. You just have to become a bit of a startup guru.</p>Mark NelsonIn the days of XP, a Windows Service was more or less an ordinary executable running in the same session as all other executables. Debugging it was fairly simple, although you did have to deal with the complication that it was started by the Service Control Manager.My Big Company Code Interview2014-03-26T07:00:00+00:002018-08-02T20:00:00+00:00https://marknelson.us/posts/2014/03/26/my-big-company-code-interview<p>It’s always exciting to get an email like this from an iconic tech giant:</p>
<blockquote>
Hi Mark,
I recently found your profile in our database, and your background is impressive. The (redacted
big company) Media Division will be flying several candidates in for interviews at our (redacted
big city) headquarters in April and considering you.
</blockquote>
<p>I’m not actively seeking a new job, so normally I’d just file this away in my emergency
just-laid-off folder, but this email had a twist that got my attention:</p>
<blockquote>
If interested in exploring Development opportunities with us, the first step will be to complete
our coding challenge ideally within the next 3 to 5 days.
</blockquote>
<p>Who doesn’t love a coding challenge?</p>
<h3>The Code Interview</h3>
<p>It’s been years since I gave up on chat interviews. It just seems like these don’t work so
well - we all feel like we are insightful judges of candidate quality, but somehow the people we
hire just don’t always turn out as well as we hope.</p>
<p>So now when my name shows up on the list of people interviewing a candidate, I forgo the usual
get-to-know-you experience in exchange for one or two simple programming exercises. Watching
someone think on their feet under modest pressure seems more enlightening than anything I can
get through conversation.</p>
<p>Apparently the big company contacting me has decided on a similar approach. The link included with
the email sent me to a site called
<a href="https://www.interviewzen.com/" target="_blank">Interview Zen</a> that promised an
interactive experience:</p>
<blockquote>
In this interview you will be shown a number of questions and asked to answer each one in
turn. ... Your interviewer will be able to see your answer unfold as if they were sitting at
the keyboard next to you.
</blockquote>
<p>This sounded pretty cool, but the actual implementation fell quite short of what I had hoped to see.</p>
<p>First, the promise of seeing me solve the problem interactively was just not met. I was asked a
single fairly complex question by Interview Zen, and given a choice from a dropdown of maybe 25
different languages to use for my solution. Interview Zen doesn’t provide any sort of IDE; they
specifically asked me to develop my answer elsewhere and post it when complete.</p>
<p>After posting my solution to the site, it was reformatted using some sort of pretty printer, and
then… nothing. No message indicating I was done, no next step, just a page with my solution.
So much for interaction.</p>
<p>And any interviewer watching the process would see my screen sit there completely blank for almost
an hour, then a <i>deus ex machina</i> as somewhere around a hundred lines appear on the screen.</p>
<p>I expect that if you want to use Interview Zen properly, you need to create a staged series of
questions so you can actually see this happen, and my big company didn’t bother to do this.</p>
<p>But still, they get what they are looking for, which is a completed code sample of a reasonable
size - big enough to actually demonstrate some work, small enough to evaluate quickly.</p>
<h3>On To the Challenge</h3>
<p>What’s more interesting is the challenge itself. I’ll lay out the assignment for you, show you my
solution, and then you can decide whether I am worth hiring based on my work.</p>
<p>The challenge question is to write a function that produces a list of products based on what your
friends are buying. I am given two functions:</p>
<p><b><code>getFriendsListForUser()</code></b>: This function returns a list of users who are friends
of a given user.</p>
<p><b><code>getPurchasesForUser()</code></b>: This function returns a list of products that have
been purchased by a given user.</p>
<p>The function I am asked to write will create a list of all the products purchased by a given
user’s friends, ranked by frequency - the most frequently purchased products should appear first.
The list should exclude any products that the user has already purchased.</p>
<p>I’m going to present the code I created step by step, excluding comments and verbiage that
belong in the finished file. You can see the whole thing by downloading
<a href="https://marknelson.us/assets/2014-03-26-my-big-company-code-interview/getRecommendations.cpp" target="blank_">getRecommendations.cpp</a>.</p>
<p><b>The Interface</b></p>
<p>To kick things off I need to define the types I’m using, the functions provided in the problem
definition, and the function I’m writing.</p>
<p>No big surprises here. About the only thing of interest to note is that I’m using a typedef for
the customer and product ids. It would be easy enough to just leave these as type
<code>std::string</code>, but adding a little type safety is a nicety - and it actually surfaced a
mistake made during development.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">CustomerId</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">ProductId</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">CustomerId</span><span class="o">></span> <span class="n">getFriendListForUser</span><span class="p">(</span><span class="k">const</span> <span class="n">CustomerId</span> <span class="o">&</span><span class="n">user</span> <span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ProductId</span><span class="o">></span> <span class="n">getPurchasesForUser</span><span class="p">(</span><span class="k">const</span> <span class="n">CustomerId</span> <span class="o">&</span><span class="n">user</span> <span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ProductId</span><span class="o">></span> <span class="n">getRecommendations</span><span class="p">(</span><span class="k">const</span> <span class="n">CustomerId</span> <span class="o">&</span><span class="n">user</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span></code></pre></figure>
<p><b>Getting the Product Counts</b></p>
<p>In the body of my function, I need to iterate over all the friends of this user, and add each of
their purchases to a master list, which I store in an <code>std::map</code>. This code is pretty
straightforward. It could have been made prettier by using C++11 features, but I’d rather be sure
that my code could be tested with tools that might not be completely up to date.</p>
<p>Using the <code>size()</code> method of a vector in a loop comparison is a bit controversial as
well, but I erred on the side of readability here.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">ProductId</span><span class="p">,</span><span class="kt">int</span><span class="o">></span> <span class="n">product_counts</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">CustomerId</span><span class="o">></span> <span class="n">friends</span> <span class="o">=</span> <span class="n">getFriendListForUser</span><span class="p">(</span><span class="n">user</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">friends</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ProductId</span><span class="o">></span> <span class="n">friend_purchases</span> <span class="o">=</span> <span class="n">getPurchasesForUser</span><span class="p">(</span> <span class="n">friends</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">);</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">size_t</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">friend_purchases</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="p">;</span> <span class="n">j</span><span class="o">++</span> <span class="p">)</span>
<span class="n">product_counts</span><span class="p">[</span><span class="n">friend_purchases</span><span class="p">[</span><span class="n">j</span><span class="p">]]</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p><b>Cleansing the Product Counts</b></p>
<p>At this point I have map that contains a range of products products purchased by friends, along
with their counts. The problem definition said I need to remove products that have already been
purchased by this user, so I have to iterate over that list and remove each found match from the map:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ProductId</span><span class="o">></span> <span class="n">user_purchases</span> <span class="o">=</span> <span class="n">getPurchasesForUser</span><span class="p">(</span> <span class="n">user</span> <span class="p">);</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">user_purchases</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">ProductId</span><span class="p">,</span><span class="kt">int</span><span class="o">>::</span><span class="n">iterator</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">product_counts</span><span class="p">.</span><span class="n">find</span><span class="p">(</span><span class="n">user_purchases</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">ii</span> <span class="o">!=</span> <span class="n">product_counts</span><span class="p">.</span><span class="n">end</span><span class="p">()</span> <span class="p">)</span>
<span class="n">product_counts</span><span class="p">.</span><span class="n">erase</span><span class="p">(</span><span class="n">ii</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p><b>Inverting the Map</b></p>
<p>So far so good. I have one last bit of work to do. I need to return the list of products sorted by
the frequency of purchase. I have that information in the map already, but to sort properly I need
the product count to reside in the Key, not the Element.</p>
<p>There are a number of choices for fixing this - I do it by simply copying the elements of the map
into a properly structured map. I could do this without a loop in a number of ways, including a
range constructor and the <code>std::copy</code> algorithm, but the explicit loop should be just
as efficient and again, the simplicity helps with readability. I think. And because multiple
products may have the same count, this needs to be an <code>std::multimap</code>:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">multimap</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span><span class="n">ProductId</span><span class="o">></span> <span class="n">sorted_products</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">ProductId</span><span class="p">,</span><span class="kt">int</span><span class="o">>::</span><span class="n">iterator</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">product_counts</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span> <span class="p">;</span>
<span class="n">ii</span> <span class="o">!=</span> <span class="n">product_counts</span><span class="p">.</span><span class="n">end</span><span class="p">();</span>
<span class="n">ii</span><span class="o">++</span> <span class="p">)</span>
<span class="n">sorted_products</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">make_pair</span><span class="p">(</span><span class="n">ii</span><span class="o">-></span><span class="n">second</span><span class="p">,</span><span class="n">ii</span><span class="o">-></span><span class="n">first</span><span class="p">));</span></code></pre></figure>
<p><b>Returning the Results</b></p>
<p>The problem didn’t ask for me to return anything but the product IDs - no counts were requested.
Just the product IDs in the proper order. So I iterate over the map in reverse order, getting the
highest product counts first, stuffing the results into a vector, and return the result:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ProductId</span><span class="o">></span> <span class="n">result</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">multimap</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span><span class="n">ProductId</span><span class="o">>::</span><span class="n">reverse_iterator</span> <span class="n">ii</span> <span class="o">=</span> <span class="n">sorted_products</span><span class="p">.</span><span class="n">rbegin</span><span class="p">();</span>
<span class="n">ii</span> <span class="o">!=</span> <span class="n">sorted_products</span><span class="p">.</span><span class="n">rend</span><span class="p">();</span>
<span class="n">ii</span><span class="o">++</span> <span class="p">)</span>
<span class="n">result</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">ii</span><span class="o">-></span><span class="n">second</span><span class="p">);</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span></code></pre></figure>
<p>All done!</p>
<h3>My Take, Their Take</h3>
<p>I think they scoped this task pretty well. In practice you could probably write and test this on a
system you are familiar with in 15 minutes. But for me to properly document it, then add some unit
test and so on ran the time up to about an hour. The problem has plenty of pitfalls for you to do
things poorly - either incorrect code or inefficient code.</p>
<p>And my implementation is far from flawless - there are a lot of details that could be improved
upon here, and I’ll leave it up to you to help flush them out.</p>
<p>Not perfect, but good enough, because my next email from the big company had good news:</p>
<blockquote>
A hiring manager from the Media team at (redacted big company) has reviewed your resume and would
like you to speak to a member of their team about the position below. ...
</blockquote>
<p>Well, I wasn’t looking for a job today, so I politely and gratefully declined, but it’s always
good to know you can at least make the first cut.</p>
<h3>Moral and Legal Hazards</h3>
<p>I was kind of surprised that none of this process was cloaked in any sort of corporate secrecy,
either implicit or explicit.</p>
<p>The email from the recruiter didn’t make any mention of confidentiality, and in fact at one point
in the process I was given the chance to share the job posting with others.</p>
<p>The Interview Zen web site didn’t offer up Terms of Service, and had no links to such on any of
the pages I went to.</p>
<p>So it would seem that legally, I am not under any sort of obligation to avoid disclosing parts of
this test.</p>
<p>But what about ethically? By discussing this aren’t I giving away possible answers to future
cheaters?</p>
<p>Maybe so, but I’m not going to sweat this. First, real cheaters would have to find this post, and
since the company name is missing, it’s not going to be trivial. Second, if the code is found,
simply doing a verbatim copy is not going to work - that will raise flags right away. In order to
customize this code and make it your own, you end up having to understand it as well as I do - in
many ways this is just as much a test as it would be to do the original composition.</p>
<p>So I post this with a clean conscience. I hope you enjoyed running through the challenge as much
as I did.</p>Mark NelsonIt’s always exciting to get an email like this from an iconic tech giant: Hi Mark,One Definition to Rule Them All2014-03-03T19:00:00+00:002018-08-02T20:00:00+00:00https://marknelson.us/posts/2014/03/03/one-definition-to-rule-them-all<p>A frequently cited rule in the C++ standard is the
<a href="https://en.wikipedia.org/wiki/One_Definition_Rule" target="_blank">One Definition Rule</a>.
In this article, I’ll show you how I inadvertently blundered into an ODR problem, found my way out,
and uncovered a deeper truth about the language standard.</p>
<p>This story starts with a very mundane exercise in debugging using the Windows API function
<code>OutputDebugString()</code>, an attempt to simplify my life, followed by a puzzle, and
eventually, enlightenment.</p>
<h3>The Pain of OutputDebugString()</h3>
<p>If you are a Windows programmer, you are probably familiar with <code>OutputDebugString()</code>.
This function allows you to send unadorned strings to the debugger. If you are inside Visual
Studio debugging your app, that usually means either your <em>Output </em>or
<em>Immediate </em>window. This is useful, but what is even more useful is its use in conjunction
with the nifty utility
<a href="https://docs.microsoft.com/en-us/sysinternals/downloads/debugview" title="DebugView" target="_blank">DebugView</a>.
DebugView registers with the O/S as a collector of these messages, and offers some nice filtering
and highlighting features - I’ve been using it for what seems like decades, and still rely on it
for help with tough problems. If you are a fan of <code>printf()</code> debugging, this is your tool.</p>
<p>Note that <code>OutputDebugString()</code> is not a general purpose logging tool - and in fact you
generally want it removed from production code. It doesn’t have the kind of flexibility needed for
this purpose, anymore than <code>fprintf()</code> would.</p>
<p>As much as I use it, I have to say that <code>OutputDebugString()</code> really sucks in many
respects. First, it only accepts a string as an argument. This means for anything except the most
trivial debugging scenarios, you are going to have to manually format your text using either
something from the <code>sprintf()</code> family of C functions or the
<code>std::ostringstream</code> class.</p>
<p>It’s pretty hard to pull this off without generating an extra three or four lines of code, and
then all that stuff needs to be cleaned up or removed when you are done. And the last thing you
need when debugging a problem is a lot of extra work.</p>
<h3>A Blueprint for Improvement</h3>
<p>There are any number of ways to work around this problem. My solution comes about via a number of
self-imposed constraints:</p>
<ul>
<li />Using my debugging output facility should require inclusion of a header file - nothing more.
No objects to be instantiated, nothing to be added to the class under test.
<li />Creating and outputting a formatted debug message of arbitrary complexity should trend to
always using just one line of code.
<li />Because debugging code that introduces bugs is a bit problematic, the code should be immune to
buffer overflows and similar issues.
</ul>
<p>Depending on your proclivities, your solution is often going to be one of these two:</p>
<p>Option One is your own creation along the lines of <code>OutputDebugStringF()</code>, taking a
formatting string a la <code>printf()</code>. Programmers who still have C-blood pumping through
their veins are going to tend to this solution, and they can even take advantage of modern
language features to eliminate many of the issues that crop up with printf() - no buffer
overflows, and even no type mismatches.</p>
<p>Option Two is a variation on <code>class OutputDebugStream</code>, which can be used to send text
to the debugger using iostream formatting. This of course gives us immunity from buffer overflows,
type safety in formatting, and a standard way use formatting for user defined types.</p>
<h3>My Implementation</h3>
<p>My personal choice is for an iostream-based solution. I am able to completely deploy this anywhere
by including the single file below in the C++ code that wants to use it:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma once
</span>
<span class="cp">#include <sstream>
#include <Windows.h>
#ifndef ODS_PREFIX
#define ODS_PREFIX "ODS: "
#endif
</span>
<span class="k">class</span> <span class="nc">ods</span> <span class="p">{</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">friend</span> <span class="n">ods</span><span class="o">&</span> <span class="k">operator</span><span class="o"><<</span><span class="p">(</span><span class="n">ods</span><span class="o">&</span> <span class="n">stream</span><span class="p">,</span> <span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">that</span><span class="p">);</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">ods</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">m_stream</span> <span class="o"><<</span> <span class="n">ODS_PREFIX</span> <span class="o"><<</span> <span class="s">" "</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">~</span><span class="n">ods</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">m_stream</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="n">OutputDebugString</span><span class="p">(</span> <span class="n">m_stream</span><span class="p">.</span><span class="n">str</span><span class="p">().</span><span class="n">c_str</span><span class="p">()</span> <span class="p">);</span>
<span class="p">}</span>
<span class="k">protected</span> <span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span> <span class="n">m_stream</span><span class="p">;</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">static</span> <span class="n">ods</span> <span class="n">createTempOds</span><span class="p">(){</span> <span class="k">return</span> <span class="n">ods</span><span class="p">();}</span>
<span class="p">};</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="n">ods</span><span class="o">&</span> <span class="k">operator</span><span class="o"><<</span><span class="p">(</span><span class="n">ods</span><span class="o">&</span> <span class="n">stream</span><span class="p">,</span> <span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">that</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">stream</span><span class="p">.</span><span class="n">m_stream</span> <span class="o"><<</span> <span class="n">that</span><span class="p">;</span>
<span class="k">return</span> <span class="n">stream</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#define ODS ods::createTempOds()</span></code></pre></figure>
<center><b>ods.h - Everything you need to use my <code>OutputDebugString</code> replacement</b></center>
<p>After including this header file in your C++ source, you can send any line you like to the
debugger like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">ODS</span> <span class="o"><<</span> <span class="s">"Value 1: "</span> <span class="o"><<</span> <span class="n">val1</span> <span class="o"><<</span> <span class="s">", Value 2: "</span> <span class="o"><<</span> <span class="n">val2</span><span class="p">;</span></code></pre></figure>
<p>Very easy, and of course the formatting is done using standard C++ rules, so there is no need to
learn anything new.</p>
<p>Not shown is the code that you can use to turn this off in production systems. Not only can you
prevent it from sending any code to the debugger, but you can easily make sure that the work done
to format the data is skipped as well.</p>
<h3>Implementation Notes</h3>
<p>The core of this is the <code>ods</code> class. This class contains an
<code>std::ostringstream</code> object that accumulates all of the data that is being formatted in
the single line for debug output. This member, <code>m_stream</code>, is initialized with a prefix
that is defined in a macro. This gives me the flexibility to change it in different source files,
making debug output easier to filter and search. When all of the data has been added to this
object, it is sent out to the debugger by extracting the C string and passing it to
<code>OutputDebugString()</code>.</p>
<p>Getting this to work properly depends on the C++ rules regarding temporary objects. The object
that is collecting all the data is a temporary created by the <code>ODS</code> macro, which calls
<code>ods::createTempOds()</code>. The temporary returned by this function is then the target of
the insertion operator <code><<</code>. Each of the following insertion operators add their
data to the temporary object. Section 12.2 of the 1998 says this about the lifetime of temporaries:</p>
<blockquote>
Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that
(lexically) contains the point where they were created.
</blockquote>
<p>This is nice, because it means that we can count on the <code>ods</code> destructor being invoked
at the end of the expression - so <code>OutputDebugString()</code> will be called when we want it
to be. If we weren’t using a temporary, we couldn’t reliably use the destructor to trigger
output - it would be called when the object goes out of scope, which may be later than we need.</p>
<p>One additional complication is that we are using the insertion operator on a user defined type,
<code>ods</code>, that doesn’t have this by default. That problem is managed at the end of the
header file with a very simple template function definition. Basically, it defines a generic
insertion function that inserts the object you want printed into the <code>m_stream</code>
object (which actually knows how to deal with it), returning a reference to itself so that
the canonical chaining can occur.</p>
<h3>This Simple, Nothing Could Go Wrong</h3>
<p>As a simple test of this, I created an app with two source files:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include "ods.h"
</span>
<span class="kt">void</span> <span class="n">sub</span><span class="p">();</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">15</span><span class="p">;</span>
<span class="n">ODS</span> <span class="o"><<</span> <span class="s">"foo: "</span> <span class="o"><<</span> <span class="mi">15</span><span class="p">;</span>
<span class="n">sub</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<center><b>main.cpp</b></center>
<p />
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#define ODS_PREFIX "sub.cpp: "
#include "ods.h"
#include <ctime>
</span>
<span class="kt">void</span> <span class="nf">sub</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">ODS</span> <span class="o"><<</span> <span class="s">"current time: "</span> <span class="o"><<</span> <span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<center><b>sub.cpp</b></center>
<p />
<p>Note that in the file <code>sub.cpp</code> I used the preprocessor to define a different prefix
string. This will allow me to easily flag lines that originate in this file. In
<code>main.cpp</code> the default prefix of <em>ODS:</em> will be used.</p>
<p>When I run this program, the debug window gives me the following unexpected output:</p>
<pre>
ODS: foo: 15
ODS: current time: 1393808992
</pre>
<p>Whoops, something is wrong. The formatting is working, but the same prefix is used in both files. I
expected the line with the current time to be prefixed with <code>sub.cpp:</code>. How did this fail?</p>
<h3>The ODR Steps In</h3>
<p>It didn’t take me long to realize the mistake I had made - I had violated the One Definition Rule (ODR).</p>
<p>In the 1998 C++ standard, the ODR is covered in section 3.2. It is a little too lengthy to
transcribe completely here, but I think I can give you the gist of it with two brief excerpts.</p>
<p>The first part, which I think of as the <i>Lesser ODR</i>, says that you can’t have two
definitions of the same thing in a file you are compiling (formally a <i>translation unit</i>):</p>
<blockquote>
No translation unit shall have more than one definition of any variable, function, class type,
enumeration type, or template.
</blockquote>
<p>Makes sense, right? You don’t expect this code to work:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">3</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The second part, which I think of as the <i>Greater ODR</i>, says that you can’t have two different
definitions of a function or object anywhere in your entire program:</p>
<blockquote>
Every program shall contain exactly one definition of every non-inline function or or object that
is used in that program; no diagnostic is required.
</blockquote>
<p>Normally including a class or function definition in a header file doesn’t cause a problem with
this rule - every place you use the function, it will have the same definition. But I slipped up
in one critical place. This line of code in the constructor uses a macro as part of its definition:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">m_stream</span> <span class="o"><<</span> <span class="n">ODS_PREFIX</span> <span class="o"><<</span> <span class="s">" "</span><span class="p">;</span></code></pre></figure>
<p>This means that the definition of the constructor used in <code>main.cpp</code> is doing this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">m_stream</span> <span class="o"><<</span> <span class="s">"ODS:"</span> <span class="o"><<</span> <span class="s">" "</span><span class="p">;</span></code></pre></figure>
<p>while the definition in sub.cpp is doing this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">m_stream</span> <span class="o"><<</span> <span class="s">"sub.cpp:"</span> <span class="o"><<</span> <span class="s">" "</span><span class="p">;</span></code></pre></figure>
<p>Two different defintions, and a sure violation of the ODR!</p>
<h3>How It Plays Out</h3>
<p>C++ programmers are used to getting a lot of help from the compiler when it comes to obvious
mistakes. Things like type safety are inextricably bound up in a reliance on getting compiler
errors when things are done improperly.</p>
<p>In a recent exchange with <a href="https://herbsutter.com/" target="_blank">Herb Sutter</a> over a
problem I was having with Visual C++, this very notion was exposed as somewhat weak. I was
expecting Visual C++ to reject some invalid C++ code, but for the particular case I was seeing,
Herb rightly pointed out:</p>
<blockquote>
Because it is invalid, compilers can do whatever they want with it.
</blockquote>
<p>Yes, that’s right, there are many, many places where the compiler is not particularly obligated to
call out your mistakes. Often they will anyway, but when faced with some types of errors in your
program, compliant compilers can pretty much do whatever they like.</p>
<p>You might think this sounds like laziness on the part of the compiler writer, but in the case of
the Greater ODR, I think the standard just formalizes the limits of what our current generation of
compilers and linkers can handle.</p>
<p>When a function is compiled to object code in two different files, the linker has to select one,
and only one, to use in your executable. For a linker to be able to flag Greater ODR violations,
it would need to look at the object code generated for every version of a function and guarantee
that it is an identical definition. This would be a tremendous amount of work, and it isn’t
something that linkers do today. So instead, the linker just picks one and goes with it.</p>
<h3>Resolution</h3>
<p>Once I saw what was going on here, the fix was easy enough. Instead of using the
<code>ODS_PREFIX</code> in the body of the constructor, I pass it in to the constructor
as an argument:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma once
</span>
<span class="cp">#include <sstream>
#include <Windows.h>
#ifndef ODS_PREFIX
#define ODS_PREFIX "ODS: "
#endif
</span>
<span class="k">class</span> <span class="nc">ods</span> <span class="p">{</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">friend</span> <span class="n">ods</span><span class="o">&</span> <span class="k">operator</span><span class="o"><<</span><span class="p">(</span><span class="n">ods</span><span class="o">&</span> <span class="n">stream</span><span class="p">,</span> <span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">that</span><span class="p">);</span>
<span class="k">public</span> <span class="o">:</span>
<span class="n">ods</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">prefix</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_stream</span> <span class="o"><<</span> <span class="n">prefix</span> <span class="o"><<</span> <span class="s">" "</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">~</span><span class="n">ods</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">m_stream</span> <span class="o"><<</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="n">OutputDebugString</span><span class="p">(</span> <span class="n">m_stream</span><span class="p">.</span><span class="n">str</span><span class="p">().</span><span class="n">c_str</span><span class="p">()</span> <span class="p">);</span>
<span class="p">}</span>
<span class="k">protected</span> <span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span> <span class="n">m_stream</span><span class="p">;</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">static</span> <span class="n">ods</span> <span class="n">createTempOds</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">prefix</span><span class="p">){</span> <span class="k">return</span> <span class="n">ods</span><span class="p">(</span><span class="n">prefix</span><span class="p">);}</span>
<span class="p">};</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="n">ods</span><span class="o">&</span> <span class="k">operator</span><span class="o"><<</span><span class="p">(</span><span class="n">ods</span><span class="o">&</span> <span class="n">stream</span><span class="p">,</span> <span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">that</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">stream</span><span class="p">.</span><span class="n">m_stream</span> <span class="o"><<</span> <span class="n">that</span><span class="p">;</span>
<span class="k">return</span> <span class="n">stream</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#define ODS ods::createTempOds(ODS_PREFIX)</span></code></pre></figure>
<p>Now all copies of the function are identical, and my output is correct:</p>
<pre>
ODS: foo: 15
sub.cpp: current time: 1393914308
</pre>Mark NelsonA frequently cited rule in the C++ standard is the One Definition Rule. In this article, I’ll show you how I inadvertently blundered into an ODR problem, found my way out, and uncovered a deeper truth about the language standard.How I Spent My Last Few Weeks2012-11-04T19:00:00+00:002018-08-02T20:30:34+00:00https://marknelson.us/posts/2012/11/04/how-i-spent-my-last-few-weeks<p><img src="https://www.var9.com/css/all/promo.png" align="right" style="margin-left:15px;border-style:solid;border-width:2px" />
Instead of writing about C+, data compression, or my usual raft of interesting topics, I’ve been
busy for the past few weeks writing a new game,
<a href="https://www.var9.com" target="_blank">Rack Master</a>.</p>
<p>Rack Master is a free training program that gives you a fresh puzzle every day. Your goal is to
make the best scores out of five racks of letters, using the scoring rules from games like
Scrabble and Words for Friends.</p>
<p>There are four levels, with varying degrees of difficulty. See what you think, give feedback, and by all means, pass the link along to any friends who are word enthusiasts.</p>Mark NelsonInstead of writing about C+, data compression, or my usual raft of interesting topics, I’ve been busy for the past few weeks writing a new game, Rack Master.The Random Compression Challenge Turns Ten2012-10-09T20:00:00+00:002018-08-02T20:00:00+00:00https://marknelson.us/posts/2012/10/09/the-random-compression-challenge-turns-ten<p><img src="/assets/2012-10-09-the-random-compression-challenge-turns-ten/220px-Random_digits.png" align="right" style="margin-left:15px;border-style:solid;border-width:2px" alt="This is a decorative picture of a table of numbers intended to represent the random numbers in the challenge. It does not add any useful content to the post." title="" />
Ten years ago I issued a simple
<a href="https://groups.google.com/forum/#!msg/comp.compression/BrES5syH_Rk/555gYFcmT4EJ" target="_blank">challenge</a>
to the compression community: reduce the size of roughly half a megabyte of <em>random data</em> -
by as little as one byte - and be the first to actually have a legitimate claim to this
accomplishment.</p>
<p>Ten years later, my challenge is still unmet. After making a small cake and blowing out the
candles, I thought this would be a good time to revisit this most venerable quest for coders
who think outside the box.</p>
<h4>Some History</h4>
<p>In George Dyson’s great book on the early history of electronic computers,
<a href="https://www.amazon.com/Turings-Cathedral-Origins-Digital-Universe/dp/0375422773" target="_blank">Turing’s Cathedral</a>,
he describes how much of the impetus for computation in the 40’s and 50’s was from the US
military’s urge to design better fission and fusion bombs. A powerful technique used in this
design work was the
<a href="https://en.wikipedia.org/wiki/Monte_Carlo_method" target="blank_">Monte Carlo method</a>,
which relied on streams of random numbers to drive simulations.</p>
<p>The problem then, as now, is that coming up with random numbers is not always an easy task.
John von Neumann was intimately involved in all this, and is famously quoted as having said:</p>
<blockquote>
any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.
</blockquote>
<p>The book describes how the RAND corporation, flush with bomb money, took on the task of using
physical processes to generate a handy bucket of random numbers, eventually published as
<a href="https://www.rand.org/pubs/monograph_reports/MR1418.html" target="_blank">A Million Random Digits with 100,000 Normal Deviates</a>.</p>
<p>Since my tax dollars paid for those numbers, I thought it only fair that I make it the basis of my
challenge. I took the decimal digits, converted them to a base two number, stored it in
<a href="/assets/2012-10-09-the-random-compression-challenge-turns-ten/AMillionRandomDigits.bin">AMillionRandomDigits.bin</a>,
and challenged the world to compress it. (The original USENET post was followed with a more
findable <a href="/posts/2006/06/20/million-digit-challenge.html" target="_blank">blog posting</a>
that contains a bit more detail.)</p>
<p>Ten years later, there have been no serious entrants, although there are a few dedicated souls who
are continuing to attack the problem. Unfortunately for all who are working on it, it seems that
those RAND scientists back in the 50’s did a really, really good job of scrubbing those
numbers.</p>
<h4>The 2012 Edition</h4>
<p>For a few different reasons, it is time to close down the older versions of the contest and issue
an updated 2012 challenge. Nothing much has changed, but over the years I’ve bumped into a few
points of confusion that I can clear up, and in addition, I would like to slightly widen the
contest’s scope. And most importantly, the comments section of the previous contest is way out of
hand, and issuing an update lets me close that stream down and open a new one.</p>
<p>For the 2012 edition, I am actually giving entrants two possible ways to win the prize. The first
challenge is essentially a reprise of the original, with minor tweaks and updates. The second
poses a more difficult problem that is a superset of the first.</p>
<p>Meeting either challenge brings you worldwide fame, a cash prize of $100, and the knowledge that
you have defeated a problem that many said was untouchable.</p>
<p>Likewise, both problems are governed by one meta-rule which seeks to implement an overarching
principle: the point of this is to win algorithmically, not to game the contest. I will disqualify
any entry that wins through means such as hiding data in filenames, environment variables, kernel
buffers, or whatever. Someone can always find a way to win with monkey business like this, but
that is beside the point of the contest. No hiding data.</p>
<h4>Challenge Version 1 - A Kolmogorov Compressor</h4>
<p>The original version of the challenge is basically unchanged. Your goal is to find the shortest
program possible that will produce the million random digit file. In other words, demonstrate that
its
<a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity" target="_blank">Kolmogorov complexity</a>
is less than its size. So the heart of Challenge 1 is the question of whether the file is
compressible a la Kolmogorov and standard, general purpose computing machines.</p>
<p>The interesting part about this challenge is that it is only <i>very likely</i> impossible.
Turing, and Godel before him, made sure that we can’t state with any certainty that there is no
program of size less than 415,241 bytes that will produce the file. All it takes is a lucky
strike. Maybe the digits are a prime? Maybe they just happen to be nested in the expansion of some
transcendental number? Or better yet, maybe the RANDians overlooked some redundancy, hidden
somewhere in a fifth order curve, just waiting to be fit. There are no telling how many
different ways you could hit the jackpot.</p>
<p>However, the dismal logic of
<a href="https://en.wikipedia.org/wiki/Pigeonhole_principle" target="_blank">The Counting Argument</a>
tells us that there are always going to be some files of size 415,241 bytes that are not
compressible by the rules of Challenge 1. And of course, it’s actually much worse than that - when
you cast a critical eye on the task, it turns out that nearly all files are incompressible. But
for a given file, we don’t really have any way of proving incompressibility.</p>
<p><b>Rulings</b></p>
<p />
<p>I want everyone to have the best chance possible to win this challenge. The basic rule is that
your program file, possibly combined with a data file, must be less than 415,241 bytes.
Clarifications on questions that have come up in the past included:</p>
<ul>
<li>Programs written in C or some other compiled language can be measured by the length of their
source, not their compiled product.</li>
<li>Using external programs to strip comments, rename variables, etc. is all fine. It would be
nice to have the unmangled source available as your input, with the mangling step part of the
submission process.</li>
<li>Programs that link to standard libraries included with the language don't have to include the
length of those libraries against their total. Hiding data in standard libraries is of course
not allowed. (And don't even think of hiding it in the kernel!)</li>
<li>Source code can be submitted in a compressed container of your choice, and I will only count
the bytes used by the container against you.</li>
<li>Likewise, any data files can be submitted in a compressed container, and I will only count
the bytes used in the container against you.</li>
<li>You own the code, and just because you win, I don't have the right to publish it. If you
insist on an NDA, I may be willing to comply.</li>
<li>In general, you need to submit source code that I can build and execute in a relatively
standard VM or container. If you are paranoid and insist on binaries only, we might be able
to come to terms, but no guarantees.</li>
<li>The nature of this contest is such that gaming the rules is pointless. You aren't entering
in a quest to beat the rules, you are entering in a quest to beat the data.</li>
<li>Your program might take a long, long time to run, but we will have to draw the line
somewhere.</li>
</ul>
<p>If there is anyone who deserves to beat this file, it is
<a href="https://disqus.com/by/ernst_berg/" target="_blank">Ernst Berg</a>, who has been relentlessly
attacking it from various directions for years now. Ernst doesn’t seem to be doing this to feed
his ego - he’ll share his results with all comers, and is always willing to listen to someone’s
new approach. I consider him to be the unofficial Sergeant-at-Arms of this enterprise.</p>
<p>But Ernst will also be the first to tell you what a harsh mistress the file can be - always
taking, but never giving.</p>
<h4>Challenge Version 2 - A General Purpose Random Compressor</h4>
<p>Challenge 1 is interesting because it is nearly, but not assuredly impossible. Challenge 2 is
more along the lines of troll bait, because it is patently impossible: create a system to
compress and then decompress <em>any</em> file of size 415,241 bytes. In other words,
create a compressed file that is smaller than the input, then use only that compressed file
to restore the original data.</p>
<p>Unlike Challenge 1, there are no size limitations on Challenge 2. Your compressor and
decompressor can be as large as you like. Because the programs have to be able to handle
<i>any</i> input data, size is of no particular advantage - there is no data to hide.</p>
<p>This challenge is for the contestant who is sure that he or she has figured out a way to compress
the million digit file, but finds that their program takes 100 MB of space. Okay, that’s fine, we
shall first see if it can compress the file. It then must be able to correctly compress and
decompress a file of the same size. Let’s say, on 1,000 different files.</p>
<p>To keep it simple, the files will be simple permutations of the million digit file - scrambled
with an encryption system, or perhaps a random number generator, XORed with a one-time pad, or
whatever. The point is, they should have all the same numeric characteristics as
the original file, just organized slightly differently.</p>
<p>Again, I will emphasize that Challenge 2 is at its heart provably impossible to beat. No program
can compress all files of a given size, and the chances of any program being able to
compress 1,000 different files of this length is so vanishingly small that it can safely be ruled
out, even if every computer on earth was working on it from now until we are engulfed in flames
during Sol’s red giant phase.</p>
<h4>Conclusion</h4>
<p>It seems unlikely that there will be any developments in lossless compression that change the
terms of this contest. No doubt I’ll reissue the challenge in another ten years or so, but if it
is beatable, the tools are already at hand. Good luck to those of you who are tackling the
challenge. Beating this will not get you the fame associated with something like the
<a href="https://www.claymath.org/millennium-problems/" target="_blank">Clay Millenium Prizes</a>,
but in the small world of data compression, you will sit alone on a throne of your own
making, and deservedly so.</p>
<h4>2018 Notes</h4>
<p>My WordPress site did not have a good system for handling all the comments on this page.
For that and a few other reasons, I am converting all my pages to static pages, which
will load much more quickly. In the future, comments will be handled by Disqus, which I think
ought to be a better way of doing things.</p>
<p>An archive of the previous 166 pages of comments can be found in this file:
<a href="/assets/2012-10-09-the-random-compression-challenge-turns-ten/Archived Comments.pdf" target="_blank">Archived Comments.pdf</a>.</p>Mark NelsonTen years ago I issued a simple challenge to the compression community: reduce the size of roughly half a megabyte of random data - by as little as one byte - and be the first to actually have a legitimate claim to this accomplishment.C++11: unique_ptr2012-06-24T19:00:00+00:002012-06-24T19:00:00+00:00https://marknelson.us/posts/2012/06/24/c11-unique_ptrt<p>There are a lot of great features in C++11, but <code>unique_ptr</code> stands out in the area of
code hygiene. Simply put, this is a magic bullet for dynamically created objects. It won’t solve
every problem, but it is really, really good at what it does: managing dynamically created objects
with simple ownership semantics.</p>
<h4>The Basics</h4>
<p>The class template <code>unique_ptr<T></code> manages a pointer to an object of type T.
You will usually construct an object of this type by calling <code>new</code> to create an object
in the <code>unique_ptr</code> constructor:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">foo</span><span class="o">></span> <span class="n">p</span><span class="p">(</span> <span class="k">new</span> <span class="n">foo</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> <span class="p">);</span></code></pre></figure>
<p>After calling the constructor, you can use the object very much like a raw pointer. The
<code>*</code> and <code>-></code> operators work exactly like you would expect, and are
very efficient - usually generating nearly the same assembly code as raw pointer access.</p>
<p>As you might expect from a class that wraps raw pointers, the first benefit you will get from using
<code>unique_ptr</code> is automatic destruction of the contained object when the pointer goes out
of scope. You don’t have to track every possible exit point from a routine to make sure the
object is freed properly - it is done automatically. And more importantly, it will be destroyed if
your function exits via an exception.</p>
<h4>Containers</h4>
<p>So far this is nice, but hardly revolutionary. Writing a class that just does what I’ve described
is fairly trivial, and you could have done it with the original C++ standard. In fact, the
ill-fated (and now deprecated) <code>auto_ptr</code> was just that, a first stab at an
RIAA pointer wrapper.</p>
<p>Unfortunately, the language hadn’t evolved to the point where <code>auto_ptr</code> could be done
properly. As a result, you couldn’t use it for some pretty basic things. For example, you
couldn’t store <code>auto_ptr</code> objects in most containers. Kind of a big problem.</p>
<p>C++11 fixed these problems with the addition of rvalue references and move semantics. As a result,
<code>unique_ptr</code> objects can be stored in containers, work properly when containers are
resized or moved, and will still be destroyed when the container is destroyed. Just like you want.</p>
<h4>Uniqueness and Move Semantics</h4>
<p>So what exactly is the meaning of the word <i>unique</i> in this context? Mostly just what it
says: when you create a <code>unique_ptr</code> object, you are declaring that you are going to
have exactly one copy of this pointer. There is never any doubt about who owns it, because
you can’t inadvertently make copies of the pointer.</p>
<p>With a classic raw pointer, this kind of code is a bug lying in wait:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">make_use</span><span class="p">(</span> <span class="n">p</span> <span class="p">);</span></code></pre></figure>
<p>Here I have allocated an object, and I have a pointer to it. When I call <code>make_use</code>,
what happens to that pointer? Does <code>make_use</code> make a copy of it for later use? Does it
take ownership of it and delete when done? Does it simply borrow it for a while and then return
it to the caller for later destruction?</p>
<p>We can’t really answer any of these questions with confidence, because C++ doesn’t make it easy
to have a contract regarding the use of a pointer. You end up relying on code inspection, memory,
or documentation. All of these things break regularly.</p>
<p>With <code>unique_ptr</code>, you won’t have these problems. If you want to pass the pointer to
another routine, you won’t make a duplicate copy of the pointer that has to be accounted
for - the compiler prohibits it.</p>
<h4>Who owns the pointer</h4>
<p>Let’s take a simple example - I create a pointer and want to store it in a container. As a new user
of <code>unique_ptr</code>, I write some pretty straightforward code:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">foo</span><span class="o">></span> <span class="n">q</span><span class="p">(</span> <span class="k">new</span> <span class="n">foo</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> <span class="p">);</span>
<span class="n">v</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span> <span class="n">q</span> <span class="p">);</span></code></pre></figure>
<p>This seems reasonable, but doing this gets me into a gray area: who owns the pointer? Will
the container destroy it at some point in its lifetime? Or is it still my job do so?</p>
<p>The rules of using <code>unique_ptr</code> prohibit this kind of code, and trying to compile it
will lead to the classic cascade of template-based compiler errors (the ones that were going to
be fixed with <em>concepts</em>, remember?), ending thus:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">unique</span><span class="p">.</span><span class="n">cpp</span><span class="p">(</span><span class="mi">26</span><span class="p">)</span> <span class="o">:</span> <span class="n">see</span> <span class="n">reference</span> <span class="n">to</span> <span class="k">class</span> <span class="nc">template</span> <span class="n">instantiation</span>
<span class="err">'</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">_Ty</span><span class="o">></span><span class="err">'</span> <span class="n">being</span> <span class="n">compiled</span></code></pre></figure>
<p>Anyway, the problem here is that we are only allowed to have one copy of the pointer - unique
ownership rules apply. If I want to give the object to another piece of code, I have to invoke
move semantics:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">v</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="p">);</span></code></pre></figure>
<p>After I move the object into the container, my original unique_ptr, q, has given up ownership of
the pointer and it now rests with the container. Although object q still exists, any attempt to
dereference it will generate a null pointer exception. In fact, after the move operation, the
internal pointer owned by q has been set to <code>null</code>.</p>
<p>Move semantics will be used automatically any place you create an rvalue reference. For example,
returning a unique_ptr from a function doesn’t require any special code:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="k">return</span> <span class="n">q</span><span class="p">;</span></code></pre></figure>
<p>Nor does passing a newly constructed object to a calling function:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">process</span><span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">foo</span><span class="o">></span><span class="p">(</span> <span class="k">new</span> <span class="n">foo</span><span class="p">(</span><span class="mi">41</span><span class="p">)</span> <span class="p">)</span> <span class="p">);</span></code></pre></figure>
<h4>Legacy code</h4>
<p>We all have legacy code to deal with, and even when using <code>unique_ptr</code>, you will find that there are times you just have to pass some function a raw pointer. There are two ways to do this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">do_something</span><span class="p">(</span> <span class="n">q</span><span class="p">.</span><span class="n">get</span><span class="p">()</span> <span class="p">);</span> <span class="c1">//retain ownership</span>
<span class="n">do_something_else</span><span class="p">(</span> <span class="n">q</span><span class="p">.</span><span class="n">release</span><span class="p">()</span> <span class="p">);</span> <span class="c1">//give up ownership</span></code></pre></figure>
<p>Calling <code>get()</code> returns a pointer to the underlying method. You really want to avoid
calling this if you can, because as soon as you release that raw pointer into the wild, you have
lost much of the advantage you achieved by switching to <code>unique_ptr</code>. With careful code
inspection you can probably convince yourself that the pointer is indeed only being used
ephemerally, and it will disappear once the called routine is done with it.</p>
<p>Extracting the pointer with <code>release()</code> is a more realistic way to do it. At this point
you are saying “I’m done with the ownership of this pointer, it’s yours now”, a fair enough
thing to say.</p>
<p>As your code base matures, you will need to say this less and less often.</p>
<p>One other place you will find that your code differs slightly from that using raw pointers is that
you will now often be passing unique_ptr as a reference argument:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">inc_baz</span><span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">foo</span><span class="o">></span> <span class="o">&</span><span class="n">p</span> <span class="p">)</span>
<span class="p">{</span>
<span class="n">p</span><span class="o">-></span><span class="n">baz</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>When passing by reference, we don’t have to worry about values being inadvertently copied and muddying ownership.
The effective meaning of passing in by reference is that of saying, go ahead and use this pointer, but the caller
is responsible for lifetime management of the object.</p>
<h4>The Cost</h4>
<p>Proponents of unique_ptr argue that the cost of using this wrapper is minimal, and this seems like it is true. Below I show you a pair of routines that increment a member of the class <code>foo</code>. One is passed a raw pointer to and increment routine, the other a <code>unique_ptr</code>.</p>
<p>Let’s look at the disassembled code that was compiled in Release mode:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">int</span> <span class="nf">inc_bazp</span><span class="p">(</span> <span class="n">foo</span> <span class="o">*</span><span class="n">p</span> <span class="p">)</span>
<span class="p">{</span>
<span class="mo">01331700</span> <span class="n">push</span> <span class="n">ebp</span>
<span class="mo">01331701</span> <span class="n">mov</span> <span class="n">ebp</span><span class="p">,</span><span class="n">esp</span>
<span class="k">return</span> <span class="n">p</span><span class="o">-></span><span class="n">baz</span><span class="o">++</span><span class="p">;</span>
<span class="mo">01331703</span> <span class="n">mov</span> <span class="n">edx</span><span class="p">,</span><span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">p</span><span class="p">]</span>
<span class="mo">01331706</span> <span class="n">mov</span> <span class="n">eax</span><span class="p">,</span><span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">edx</span><span class="o">+</span><span class="mi">18</span><span class="n">h</span><span class="p">]</span>
<span class="mo">0133170</span><span class="mi">9</span> <span class="n">lea</span> <span class="n">ecx</span><span class="p">,[</span><span class="n">eax</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="mo">0133170</span><span class="n">C</span> <span class="n">mov</span> <span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">edx</span><span class="o">+</span><span class="mi">18</span><span class="n">h</span><span class="p">],</span><span class="n">ecx</span>
<span class="p">}</span>
<span class="mo">0133170</span><span class="n">F</span> <span class="n">pop</span> <span class="n">ebp</span>
<span class="mo">01331710</span> <span class="n">ret</span>
<span class="kt">int</span> <span class="nf">inc_baz</span><span class="p">(</span> <span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o"><</span><span class="n">foo</span><span class="o">></span> <span class="o">&</span><span class="n">p</span> <span class="p">)</span>
<span class="p">{</span>
<span class="mo">00</span><span class="n">AC16E0</span> <span class="n">push</span> <span class="n">ebp</span>
<span class="mo">00</span><span class="n">AC16E1</span> <span class="n">mov</span> <span class="n">ebp</span><span class="p">,</span><span class="n">esp</span>
<span class="k">return</span> <span class="n">p</span><span class="o">-></span><span class="n">baz</span><span class="o">++</span><span class="p">;</span>
<span class="mo">00</span><span class="n">AC16E3</span> <span class="n">mov</span> <span class="n">eax</span><span class="p">,</span><span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">p</span><span class="p">]</span>
<span class="mo">00</span><span class="n">AC16E6</span> <span class="n">mov</span> <span class="n">edx</span><span class="p">,</span><span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">eax</span><span class="p">]</span>
<span class="mo">00</span><span class="n">AC16E8</span> <span class="n">mov</span> <span class="n">eax</span><span class="p">,</span><span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">edx</span><span class="o">+</span><span class="mi">18</span><span class="n">h</span><span class="p">]</span>
<span class="mo">00</span><span class="n">AC16EB</span> <span class="n">lea</span> <span class="n">ecx</span><span class="p">,[</span><span class="n">eax</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="mo">00</span><span class="n">AC16EE</span> <span class="n">mov</span> <span class="n">dword</span> <span class="n">ptr</span> <span class="p">[</span><span class="n">edx</span><span class="o">+</span><span class="mi">18</span><span class="n">h</span><span class="p">],</span><span class="n">ecx</span>
<span class="p">}</span>
<span class="mo">00</span><span class="n">AC16F1</span> <span class="n">pop</span> <span class="n">ebp</span>
<span class="mo">00</span><span class="n">AC16F2</span> <span class="n">ret</span> </code></pre></figure>
<p>The <code>unique_ptr</code> version does indeed have one extra pointer dereference in comparison
to the raw pointer version. While it is not so easy to count cycles these days, it seems likely
that difference between these two cases are going to be quite low, perhaps as low as 10%. A
routine that performed multiple operations on the object would see the penalty reduced further.</p>
<h4>One Final Note</h4>
<p>I’m sold on <code>unique_ptr</code>, so you will be seeing it plenty in my code. However, you
might be feeling a bit more cautious. And that’s okay, with C++11, it is pretty easy for you to
test the waters without too much extra work. Just make sure that the routines that use
pointers use the <code>auto </code>type to hold all pointers - this means no changes in the
consumer code if you change to/from raw pointers to <code>unique_pointer</code>. And routines that
are passed these pointers can be defined as function templates, making it easy to adapt to
whatever type of data is passed in.</p>
<p>In a perfect world, with no legacy code requiring raw pointer semantics, <code>unique_ptr</code>
can literally guarantee that you won’t leak data allocated from the free store. This was simply
not possible in a reasonable way until C++11. Use it!</p>Mark NelsonThere are a lot of great features in C++11, but unique_ptr stands out in the area of code hygiene. Simply put, this is a magic bullet for dynamically created objects. It won’t solve every problem, but it is really, really good at what it does: managing dynamically created objects with simple ownership semantics.