Alexander Stepanov’s Standard Template Library provided a huge push towards making template programming an important part of C++, and helped to insure that it was included as part of the first standard in 1998. But adoption of templates by most programmers has been more of an incremental process, as opposed to a revolutionary one - many of us have literally decades of Object Oriented == The One True Path to unlearn. And to top things off, templates needed some refinement over the years before they could do many of the things we expected of them.

In this article, I’ll show you how generic programming with templates makes a noticeable improvement in a pedestrian but important task I deal with constantly in my data compression work. Then I’ll show you the annoying but understandable problems with this approach circa 1998, and finally, how to resolve them with features added in TR1 and C++11.

Bit-oriented I/O

Most data compression algorithms read their input data using well-supported byte-oriented I/O. Text files are read a byte at a time, and most image files are blocked into data structures that align to byte boundaries.

But the output of compressors tends to be in odd sizes. For example, venerable LZW compression output usually starts with nine-bit wide codes which grow as time goes on, through ten, eleven, and twelve bits, up to some arbitrary size.

The standard C++ libraries don’t have I/O APIs that support these odd sizes, so any compressor implementation has to build these I/O routines into their program.

And of course, the inverse task, decompression, needs to be able to read data in non-standard sizes in order to write uncompressed data. This is effectively the same problem, just turned on its head.

To illustrate how this works, I’ll use the example of arithmetic compression in this article. Canonical CACM89 arithmetic compression outputs one bit at a time. So whether you are writing to files, sockets, or shared memory, you are going to need to have some kind of shim in your library that converts those single bits into bytes suitable for use with standard libraries.

The Solution Before Templates

If you are writing a dedicated compressor that is purpose-built for some application, you would probably just embed some dedicated code in your input and output routines. This article is less important for a one-off solution like that.

Things get more complicated if you are a library writer. If you want to provide an arithmetic compression capability to end users, you don’t have any idea in advance where their I/O is going. So the classical OOP solution to this is to define a base class with virtual methods that the compressor needs, then require library users to implement derived classes that provide all missing functions and obey certain rules.

In this case, my arithmetic compressor will have dedicated routines that do the bit-fiddling, and then rely on a shim class to provide byte-oriented input and output. This means the library can read or write from a disparate variety of sources. My pre-template solution to this is illustrated with a partial code segment from bitio_00.cpp. This uses classic OOP to create input and output classes, which are then called from inside the compressor.

class compressor
{
public :
  compressor(input_bytes &input, output_bytes &output ) : 
    m_input(input),
    m_output(output),
    m_NextByte(0),
    m_Mask(0x80) 
  {
  }
  ~compressor()
  {
    if ( m_Mask != 0x80 )
      m_output.putByte(m_NextByte);
  }
  int operator()()
  {
    //compression takes place here
    // e.g. putBit( 0 ); putBit( 1 );
  }
protected:
  void putBit( bool val ) {
    if ( val )
      m_NextByte |= m_Mask;
    m_Mask >>= 1;
    if ( !m_Mask) {
      m_output.putByte(m_NextByte);
      m_Mask = 0x80;
      m_NextByte = 0;
    }
  }
private :
  output_bytes &m_output;
  input_bytes &m_input;
  char m_NextByte;
  int m_Mask;
};

If I were going to perform compression on a pair of iostream objects, I would need simple derived classes that implemented the necessary shim code for the putByte() and getByte() functions. A simple version of the input class used in bitio_00.cpp is shown here:

class input_bytes
{
public :
  virtual int getByte() = 0;
};

class input_bytes_istream : public input_bytes
{
public :
  input_bytes_istream(std::istream &stream) 
    : m_stream(stream)
  {
  }
  int getByte()
  {
    return m_stream.get();
  }
private :
  std::istream &m_stream;
};

This works, and at least for the library writer it is fairly satisfactory.

As a library user it is not really optimal. I’d like to be able to use a library without having to create shim classes. Of course the library writer is free to implement common shim classes - which can then be used as input for more customized approaches.

Withholding any other judgment, this is an object oriented solution that is fairly simple to understand and implement, and there are plenty of libraries out there that use this sort of implementation. You probably find it somewhat familiar and can imagine how you might deal with it as a user.

So Where Are The Problems?

Although the OOP solution to this problem works, there are a couple of problems.

First, as library writers start identifying additional things they need in their I/O classes, the abstract base classes sometimes get larded with additional methods, making it more and more complicated to implement specialized versions. While not a problem in this specific example, it definitely is a problem in the real world.

For example, the arithmetic compressor writer might decide that block I/O offered some performance improvements, leading him or her to add methods for putBytes() and getBytes(). This kind of creeping elegance leads to complex base classes that annoy users.

However, this is more a stylistic problem, and not necessarily tied directly to OOP, although it does seem to be a natural outgrowth of that type of programming.

A more important problem, and one that is really a big deal for data compression, is the cost of using virtual base classes to implement I/O. Functions like putBit() are called repeatedly in tight loops. The fact that they have to invoke virtual functions is annoying - these are hard for the C++ compiler to optimize at compile time, and so each call is an expensive (usually one level of pointer indirection) subroutine call.

What would be much better, and what the template-based approach allows for, is to define the input and output routines as parameterized types using templates. The compiler can then generate the code that calls those functions inline with the compression code - avoiding those expensive subroutine calls and giving the compiler leeway to optimize at will. In some circumstances, this can result in a substantial improvement in runtimes.

The 1998 Template-based Solution

So how would my code look if I ditched OOP and used a generic solution? My sample code in bitio_01.cpp has a compressor class that is declared like this:

template<typename INPUT, typename OUTPUT>
class compressor
{
public :
  compressor(INPUT &input, OUTPUT &output ) : 
    m_input(input),
    m_output(output),
    m_NextByte(0),
    m_Mask(0x80) 
  {
  }
  ...
protected:
  void putBit( bool val ) {
    if ( val )
      m_NextByte |= m_Mask;
    m_Mask >>= 1;
    if ( !m_Mask) {
      m_output.putByte(m_NextByte);
      m_Mask = 0x80;
      m_NextByte = 0;
    }
...

With this kind of template programming, I can construct my compressor with an input object of any class that has a getByte() method, and an output object of any class that has a putByte() method.

As a library writer, I am going to go ahead and provide a convenience class that takes an std::ostream object and implements the needed function. My template class, output_bytes<T>, is defined for std::ostream only. I include a default implementation that will cause a compiler error if you attempt to construct it with some other type of object - this is a 1998-circa attempt to perform compile-time assertions.

template<typename T>
class output_bytes
{
private :
  //
  // If you try to instantiate an output_bytes<T>
  // object for a type that doesn't have a specialization,
  // you will get an error indicating that you are 
  // trying to use this private constructor. 
  //
  output_bytes(...);
public :
  void putByte(char); 
};

//
// Specialization of output_bytes for class ostream
//
template<>
class output_bytes<std::ostream>
{
public :
  output_bytes(std::ostream &stream) : m_stream(stream){}
  void putByte(char c)
  {
    m_stream.put(c);
  }
private :
  std::ostream &m_stream;
};

With that in place, I can now pass in ostream objects to my compressor without the user having to implement a shim class. I use a helper function to deal with some of the mechanics needed to actually construct the compressor and run it:

//
// This convenience function takes care of
// constructing the compressor and the
// input and output objects, then calling
// the compressor.
//
template<typename INPUT, typename OUTPUT>
int compress(INPUT &source, OUTPUT &target)
{
  input_bytes<INPUT> in(source);
  output_bytes<OUTPUT> out(target);
  compressor<input_bytes<INPUT>,output_bytes<OUTPUT> > c(in,out);
  return c();
}

int main(int argc, char* argv[])
{
  std::ostream *pOut = new std::ofstream("output0.bin");
  std::istream *pIn = new std::ifstream("input1.txt");
  compress(*pIn, *pOut);
  delete pIn;
  delete pOut;
  return 0;
}

Templates Lack OOP Semantics

So in the code above, I’ve created shim classes that should work with anything derived from std::iostream. I should be able to read and write to files, the console, memory, etc.

This is true, but you will note right away that this template-based solution won’t quite match up with your expectations. If you change the code shown above to look like this:

int main(int argc, char* argv[])
{
  std::ofstream out("output0.bin");
  std::istream *pIn = new std::ifstream("input1.txt");
  compress(*pIn, out);
  delete pIn;
  return 0;
}

You’ll get the error showing that you have tried to instantiate the generalized class, not the std::ostream specialization:

1>------ Build started: Project: traits, Configuration: Debug Win32 ------
1>Compiling...
1>bitio_01.cpp
1>bitio_01.cpp(121) : error C2248: 'output_bytes<T>::output_bytes' 
1> cannot access private member declared in class 'output_bytes<T>'
1> with
1> [
1> T=std::ofstream
1> ]
1> bitio_01.cpp(15) : see declaration of 'output_bytes<T>::output_bytes'
1> with
1> [
1> T=std::ofstream
1> ]

So what’s going on here? My template class is specialized for std::ostream, and std::ofstream is an std::ostream object, is it not?

No, it isn’t. As OOP programmers we casually say that an std::ofstream object is an std::ostream object, but this is syntactic shorthand for a deeper OOP principle. At the surface level, these are two separate classes, quite distinct from one another.

And therein lies the problem - template instantiation is simple-minded - when you create a specialized version of class output_bytes for std::ostream, the compiler will construct it for that one class, and that one class only.

So what this means is that my convenience class output_bytes<std::ostream> is not quite as convenient as I would like it to be. Anyone using it for class derived from ostream will have to upcast their arguments:

int main(int argc, char* argv[])
{
  std::ofstream output1("output0.bin");
  std::istream &input1 = std::ifstream("input1.txt");
  compress(input1, dynamic_cast<std::ostream&>(output1));
  return 0;
}

The Enable If Idiom

This should be a problem I can fix. What I want to do is modify my template definition of output_bytes so that it works with std::ostream or any class derived from it.

This can be done with the help of two relatively new classes. std::enable_if is a template class that was added to the library with C++11, and std::is_base_of is a struct added with TR1 in 2007. I’ll show the new definition below, then go on to try to analyze what is happening:

template<typename T,typename Enable = void>
class output_bytes
{
private :
  output_bytes(...);
public :
  void putByte(char); 
};

//
// Specialization of output_bytes for class ostream
// and classes derived from ostream
//
using std::is_base_of;
using std::ostream;

template<typename T>
class output_bytes<T,typename enable_if<is_base_of<ostream, T>::value>::type>
{
public :
  output_bytes(T &stream) : m_stream(stream){}
  void putByte(char c)
  {
    m_stream.put(c);
  }
private :
  T &m_stream;
};

Comparing this code with what was seen in the 1998 template version, you’ll see that the only real changes are in the part of the class definition in which template parameters are declared.

First, the base class now has two parameters - the first being a type that will nominally be a class derived from std::ostream, with the second being an arbitrary type parameter called Enable, which defaults to type void. When instantiating objects of class output_bytes, I’m still going to just use the single type specifier, as shown below.

template<typename INPUT, typename OUTPUT>
int compress(INPUT &source, OUTPUT &target)
{
  input_bytes<INPUT> in(source);
  output_bytes<OUTPUT> out(target);
  compressor<input_bytes<INPUT>,output_bytes<OUTPUT> > c(in,out);
  return c();
}

So where does the second type parameter come into play?

In the specialization of output_bytes that I have created for classes derived from std::ostream, you’ll see that I am using an expression for this second parameter:

enable_if<is_base_of<ostream, T>::value>::type

This expression evaluates to a type of void if T is type std::ostream or a class derived from it. If T does not match up this way, then the expression is not defined.

SFINAE

So in some cases a template parameter is defined, and in other cases it is not defined. Let’s see how this works. The inner part of the expression is:

is_base_of<ostream, T>::value

If you look up std::is_base_of, you’ll see that the instantiation of that struct has a constant static member named value, which is true if T is derived from std::ostream (in this case), and false if it is not. Since this is a static constant member of the struct, it is known at compile time and can be used as a template argument.

This means that we are instantiating std::enable_if with a constant that is either true or false. This is where things get interesting. We are passing std::enable_if<T>::value as the second argument to output_bytes<T,Enable>. Looking at the definition for enable_if, you’ll see that if the argument passed to it is true, it has a typedef for value. If the template argument is false, the typedef does not exist.

So if T is the desired std::ostream derived class, we are passing in two type arguments to the template definition. But if T is some other unwanted class, the second argument doesn’t exist. What happens then?

It turns out that C++ has a very specific rule in place for this scenario: Substitution Failure Is Not an Error, or SFINAE. What this means is that in this case, the compiler finds that it can’t provide the second argument to the class definition for output_bytes, so it simply skips trying to instantiate it. Not an error.

This means that the slightly wonky definition does exactly what I want it to do - it instantiates for all std::ostream classes, and doesn’t for others.

Conclusion

Changing a template specialization so that it applies to an entire class hierarchy instead of just a single class can be done fairly easily using modern C++ tools. It involves some fairly benign changes to a class declaration, and imposes no runtime cost.

As always, this type of template metaprogramming is somewhat difficult to get your head around if you are a traditional C++ procedural programmer, as so many of us are. You have to get used to the concept of passing types as parameters at compile time, and this is just completely new territory.

As C++ adds more support for traits-based programming in the post-C++11 future, it may be that creating this type of class becomes simpler. But for now, it works pretty well without too much pain, as long as you can get your compiler updated to TR1 or later.

Note that boost provides libraries that implemented enable_if and is_base_of long before TR1, so you should be able to implement this type of code with just about any C++ compiler in use today. A home-grown implementation of enable_if is trivial, and is included in the sample code:

bitio.zip