One Definition to Rule Them All

A frequently cited rule in the C++ standard is the One Definition Rule. In this article, I’ll show you how I inadvertently blundered into an ODR problem, found my way out, and uncovered a deeper truth about the language standard.

This story starts with a very mundane exercise in debugging using the Windows API function OutputDebugString(), an attempt to simplify my life, followed by a puzzle, and eventually, enlightenment.

The Pain of OutputDebugString()

If you are a Windows programmer, you are probably familiar with OutputDebugString(). This function allows you to send unadorned strings to the debugger. If you are inside Visual Studio debugging your app, that usually means either your Output or Immediate window. This is useful, but what is even more useful is its use in conjunction with the nifty utility DebugView. DebugView registers with the O/S as a collector of these messages, and offers some nice filtering and highlighting features - I’ve been using it for what seems like decades, and still rely on it for help with tough problems. If you are a fan of printf() debugging, this is your tool.

Note that OutputDebugString() is not a general purpose logging tool - and in fact you generally want it removed from production code. It doesn’t have the kind of flexibility needed for this purpose, anymore than fprintf() would.

As much as I use it, I have to say that OutputDebugString() really sucks in many respects. First, it only accepts a string as an argument. This means for anything except the most trivial debugging scenarios, you are going to have to manually format your text using either something from the sprintf() family of C functions or the std::ostringstream class.

It’s pretty hard to pull this off without generating an extra three or four lines of code, and then all that stuff needs to be cleaned up or removed when you are done. And the last thing you need when debugging a problem is a lot of extra work.

A Blueprint for Improvement

There are any number of ways to work around this problem. My solution comes about via a number of self-imposed constraints:

Using my debugging output facility should require inclusion of a header file - nothing more. No objects to be instantiated, nothing to be added to the class under test.
Creating and outputting a formatted debug message of arbitrary complexity should trend to always using just one line of code.
Because debugging code that introduces bugs is a bit problematic, the code should be immune to buffer overflows and similar issues.

Depending on your proclivities, your solution is often going to be one of these two:

Option One is your own creation along the lines of OutputDebugStringF(), taking a formatting string a la printf(). Programmers who still have C-blood pumping through their veins are going to tend to this solution, and they can even take advantage of modern language features to eliminate many of the issues that crop up with printf() - no buffer overflows, and even no type mismatches.

Option Two is a variation on class OutputDebugStream, which can be used to send text to the debugger using iostream formatting. This of course gives us immunity from buffer overflows, type safety in formatting, and a standard way use formatting for user defined types.

My Implementation

My personal choice is for an iostream-based solution. I am able to completely deploy this anywhere by including the single file below in the C++ code that wants to use it:

#pragma once

#include <sstream>
#include <Windows.h>
#ifndef ODS_PREFIX
#define ODS_PREFIX "ODS: "
#endif

class ods {
  template<typename T> 
  friend ods& operator<<(ods& stream, const T& that);
public :
  ods()
  {
    m_stream << ODS_PREFIX << " ";
  }
  ~ods()
  {
    m_stream << "\n";
    OutputDebugString( m_stream.str().c_str() );
  }
protected :
  std::ostringstream m_stream;
public:
  static ods createTempOds(){ return ods();}
};

template<typename T>
ods& operator<<(ods& stream, const T& that)
{
  stream.m_stream << that;
  return stream;
}

#define ODS ods::createTempOds()

ods.h - Everything you need to use my OutputDebugString replacement

After including this header file in your C++ source, you can send any line you like to the debugger like this:

  ODS << "Value 1: " << val1 << ", Value 2: " << val2;

Very easy, and of course the formatting is done using standard C++ rules, so there is no need to learn anything new.

Not shown is the code that you can use to turn this off in production systems. Not only can you prevent it from sending any code to the debugger, but you can easily make sure that the work done to format the data is skipped as well.

Implementation Notes

The core of this is the ods class. This class contains an std::ostringstream object that accumulates all of the data that is being formatted in the single line for debug output. This member, m_stream, is initialized with a prefix that is defined in a macro. This gives me the flexibility to change it in different source files, making debug output easier to filter and search. When all of the data has been added to this object, it is sent out to the debugger by extracting the C string and passing it to OutputDebugString().

Getting this to work properly depends on the C++ rules regarding temporary objects. The object that is collecting all the data is a temporary created by the ODS macro, which calls ods::createTempOds(). The temporary returned by this function is then the target of the insertion operator <<. Each of the following insertion operators add their data to the temporary object. Section 12.2 of the 1998 says this about the lifetime of temporaries:

Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created.

This is nice, because it means that we can count on the ods destructor being invoked at the end of the expression - so OutputDebugString() will be called when we want it to be. If we weren’t using a temporary, we couldn’t reliably use the destructor to trigger output - it would be called when the object goes out of scope, which may be later than we need.

One additional complication is that we are using the insertion operator on a user defined type, ods, that doesn’t have this by default. That problem is managed at the end of the header file with a very simple template function definition. Basically, it defines a generic insertion function that inserts the object you want printed into the m_stream object (which actually knows how to deal with it), returning a reference to itself so that the canonical chaining can occur.

This Simple, Nothing Could Go Wrong

As a simple test of this, I created an app with two source files:

#include "ods.h"

void sub();

int main(int argc, char* argv[])
{
    int foo = 15;
    ODS << "foo: " << 15;
    sub();
    return 0;
}

main.cpp

#define ODS_PREFIX "sub.cpp: "
#include "ods.h"
#include <ctime>

void sub()
{
    ODS << "current time: " << time(NULL);
    return;
}

sub.cpp

Note that in the file sub.cpp I used the preprocessor to define a different prefix string. This will allow me to easily flag lines that originate in this file. In main.cpp the default prefix of ODS: will be used.

When I run this program, the debug window gives me the following unexpected output:

ODS:  foo: 15
ODS:  current time: 1393808992

Whoops, something is wrong. The formatting is working, but the same prefix is used in both files. I expected the line with the current time to be prefixed with sub.cpp:. How did this fail?

The ODR Steps In

It didn’t take me long to realize the mistake I had made - I had violated the One Definition Rule (ODR).

In the 1998 C++ standard, the ODR is covered in section 3.2. It is a little too lengthy to transcribe completely here, but I think I can give you the gist of it with two brief excerpts.

The first part, which I think of as the Lesser ODR, says that you can’t have two definitions of the same thing in a file you are compiling (formally a translation unit):

No translation unit shall have more than one definition of any variable, function, class type, enumeration type, or template.

Makes sense, right? You don’t expect this code to work:

int foo()
{
  return 2;
}
int foo()
{
  return 3;
}

The second part, which I think of as the Greater ODR, says that you can’t have two different definitions of a function or object anywhere in your entire program:

Every program shall contain exactly one definition of every non-inline function or or object that is used in that program; no diagnostic is required.

Normally including a class or function definition in a header file doesn’t cause a problem with this rule - every place you use the function, it will have the same definition. But I slipped up in one critical place. This line of code in the constructor uses a macro as part of its definition:

    m_stream << ODS_PREFIX << " ";

This means that the definition of the constructor used in main.cpp is doing this:

    m_stream << "ODS:" << " ";

while the definition in sub.cpp is doing this:

    m_stream << "sub.cpp:" << " ";

Two different defintions, and a sure violation of the ODR!

How It Plays Out

C++ programmers are used to getting a lot of help from the compiler when it comes to obvious mistakes. Things like type safety are inextricably bound up in a reliance on getting compiler errors when things are done improperly.

In a recent exchange with Herb Sutter over a problem I was having with Visual C++, this very notion was exposed as somewhat weak. I was expecting Visual C++ to reject some invalid C++ code, but for the particular case I was seeing, Herb rightly pointed out:

Because it is invalid, compilers can do whatever they want with it.

Yes, that’s right, there are many, many places where the compiler is not particularly obligated to call out your mistakes. Often they will anyway, but when faced with some types of errors in your program, compliant compilers can pretty much do whatever they like.

You might think this sounds like laziness on the part of the compiler writer, but in the case of the Greater ODR, I think the standard just formalizes the limits of what our current generation of compilers and linkers can handle.

When a function is compiled to object code in two different files, the linker has to select one, and only one, to use in your executable. For a linker to be able to flag Greater ODR violations, it would need to look at the object code generated for every version of a function and guarantee that it is an identical definition. This would be a tremendous amount of work, and it isn’t something that linkers do today. So instead, the linker just picks one and goes with it.

Resolution

Once I saw what was going on here, the fix was easy enough. Instead of using the ODS_PREFIX in the body of the constructor, I pass it in to the constructor as an argument:

#pragma once

#include <sstream>
#include <Windows.h>
#ifndef ODS_PREFIX
#define ODS_PREFIX "ODS: "
#endif

class ods {
    template<typename T> 
    friend ods& operator<<(ods& stream, const T& that);
public :
    ods(const char *prefix)
    {
        m_stream << prefix << " ";
    }
    ~ods()
    {
        m_stream << "\n";
        OutputDebugString( m_stream.str().c_str() );
    }
protected :
    std::ostringstream m_stream;
public:
    static ods createTempOds(const char *prefix){ return ods(prefix);}
};

template<typename T>
ods& operator<<(ods& stream, const T& that)
{
    stream.m_stream << that;
    return stream;
}

#define ODS ods::createTempOds(ODS_PREFIX)

Now all copies of the function are identical, and my output is correct:

ODS:  foo: 15
sub.cpp:  current time: 1393914308