One Definition to Rule Them All
A frequently cited rule in the C++ standard is the One Definition Rule. In this article, I’ll show you how I inadvertently blundered into an ODR problem, found my way out, and uncovered a deeper truth about the language standard.
This story starts with a very mundane exercise in debugging using the Windows API function
OutputDebugString()
, an attempt to simplify my life, followed by a puzzle, and
eventually, enlightenment.
The Pain of OutputDebugString()
If you are a Windows programmer, you are probably familiar with OutputDebugString()
.
This function allows you to send unadorned strings to the debugger. If you are inside Visual
Studio debugging your app, that usually means either your Output or
Immediate window. This is useful, but what is even more useful is its use in conjunction
with the nifty utility
DebugView.
DebugView registers with the O/S as a collector of these messages, and offers some nice filtering
and highlighting features - I’ve been using it for what seems like decades, and still rely on it
for help with tough problems. If you are a fan of printf()
debugging, this is your tool.
Note that OutputDebugString()
is not a general purpose logging tool - and in fact you
generally want it removed from production code. It doesn’t have the kind of flexibility needed for
this purpose, anymore than fprintf()
would.
As much as I use it, I have to say that OutputDebugString()
really sucks in many
respects. First, it only accepts a string as an argument. This means for anything except the most
trivial debugging scenarios, you are going to have to manually format your text using either
something from the sprintf()
family of C functions or the
std::ostringstream
class.
It’s pretty hard to pull this off without generating an extra three or four lines of code, and then all that stuff needs to be cleaned up or removed when you are done. And the last thing you need when debugging a problem is a lot of extra work.
A Blueprint for Improvement
There are any number of ways to work around this problem. My solution comes about via a number of self-imposed constraints:
-
Using my debugging output facility should require inclusion of a header file - nothing more.
No objects to be instantiated, nothing to be added to the class under test.
Creating and outputting a formatted debug message of arbitrary complexity should trend to
always using just one line of code.
Because debugging code that introduces bugs is a bit problematic, the code should be immune to
buffer overflows and similar issues.
Depending on your proclivities, your solution is often going to be one of these two:
Option One is your own creation along the lines of OutputDebugStringF()
, taking a
formatting string a la printf()
. Programmers who still have C-blood pumping through
their veins are going to tend to this solution, and they can even take advantage of modern
language features to eliminate many of the issues that crop up with printf() - no buffer
overflows, and even no type mismatches.
Option Two is a variation on class OutputDebugStream
, which can be used to send text
to the debugger using iostream formatting. This of course gives us immunity from buffer overflows,
type safety in formatting, and a standard way use formatting for user defined types.
My Implementation
My personal choice is for an iostream-based solution. I am able to completely deploy this anywhere by including the single file below in the C++ code that wants to use it:
OutputDebugString
replacementAfter including this header file in your C++ source, you can send any line you like to the debugger like this:
Very easy, and of course the formatting is done using standard C++ rules, so there is no need to learn anything new.
Not shown is the code that you can use to turn this off in production systems. Not only can you prevent it from sending any code to the debugger, but you can easily make sure that the work done to format the data is skipped as well.
Implementation Notes
The core of this is the ods
class. This class contains an
std::ostringstream
object that accumulates all of the data that is being formatted in
the single line for debug output. This member, m_stream
, is initialized with a prefix
that is defined in a macro. This gives me the flexibility to change it in different source files,
making debug output easier to filter and search. When all of the data has been added to this
object, it is sent out to the debugger by extracting the C string and passing it to
OutputDebugString()
.
Getting this to work properly depends on the C++ rules regarding temporary objects. The object
that is collecting all the data is a temporary created by the ODS
macro, which calls
ods::createTempOds()
. The temporary returned by this function is then the target of
the insertion operator <<
. Each of the following insertion operators add their
data to the temporary object. Section 12.2 of the 1998 says this about the lifetime of temporaries:
Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created.
This is nice, because it means that we can count on the ods
destructor being invoked
at the end of the expression - so OutputDebugString()
will be called when we want it
to be. If we weren’t using a temporary, we couldn’t reliably use the destructor to trigger
output - it would be called when the object goes out of scope, which may be later than we need.
One additional complication is that we are using the insertion operator on a user defined type,
ods
, that doesn’t have this by default. That problem is managed at the end of the
header file with a very simple template function definition. Basically, it defines a generic
insertion function that inserts the object you want printed into the m_stream
object (which actually knows how to deal with it), returning a reference to itself so that
the canonical chaining can occur.
This Simple, Nothing Could Go Wrong
As a simple test of this, I created an app with two source files:
Note that in the file sub.cpp
I used the preprocessor to define a different prefix
string. This will allow me to easily flag lines that originate in this file. In
main.cpp
the default prefix of ODS: will be used.
When I run this program, the debug window gives me the following unexpected output:
ODS: foo: 15 ODS: current time: 1393808992
Whoops, something is wrong. The formatting is working, but the same prefix is used in both files. I
expected the line with the current time to be prefixed with sub.cpp:
. How did this fail?
The ODR Steps In
It didn’t take me long to realize the mistake I had made - I had violated the One Definition Rule (ODR).
In the 1998 C++ standard, the ODR is covered in section 3.2. It is a little too lengthy to transcribe completely here, but I think I can give you the gist of it with two brief excerpts.
The first part, which I think of as the Lesser ODR, says that you can’t have two definitions of the same thing in a file you are compiling (formally a translation unit):
No translation unit shall have more than one definition of any variable, function, class type, enumeration type, or template.
Makes sense, right? You don’t expect this code to work:
The second part, which I think of as the Greater ODR, says that you can’t have two different definitions of a function or object anywhere in your entire program:
Every program shall contain exactly one definition of every non-inline function or or object that is used in that program; no diagnostic is required.
Normally including a class or function definition in a header file doesn’t cause a problem with this rule - every place you use the function, it will have the same definition. But I slipped up in one critical place. This line of code in the constructor uses a macro as part of its definition:
This means that the definition of the constructor used in main.cpp
is doing this:
while the definition in sub.cpp is doing this:
Two different defintions, and a sure violation of the ODR!
How It Plays Out
C++ programmers are used to getting a lot of help from the compiler when it comes to obvious mistakes. Things like type safety are inextricably bound up in a reliance on getting compiler errors when things are done improperly.
In a recent exchange with Herb Sutter over a problem I was having with Visual C++, this very notion was exposed as somewhat weak. I was expecting Visual C++ to reject some invalid C++ code, but for the particular case I was seeing, Herb rightly pointed out:
Because it is invalid, compilers can do whatever they want with it.
Yes, that’s right, there are many, many places where the compiler is not particularly obligated to call out your mistakes. Often they will anyway, but when faced with some types of errors in your program, compliant compilers can pretty much do whatever they like.
You might think this sounds like laziness on the part of the compiler writer, but in the case of the Greater ODR, I think the standard just formalizes the limits of what our current generation of compilers and linkers can handle.
When a function is compiled to object code in two different files, the linker has to select one, and only one, to use in your executable. For a linker to be able to flag Greater ODR violations, it would need to look at the object code generated for every version of a function and guarantee that it is an identical definition. This would be a tremendous amount of work, and it isn’t something that linkers do today. So instead, the linker just picks one and goes with it.
Resolution
Once I saw what was going on here, the fix was easy enough. Instead of using the
ODS_PREFIX
in the body of the constructor, I pass it in to the constructor
as an argument:
Now all copies of the function are identical, and my output is correct:
ODS: foo: 15 sub.cpp: current time: 1393914308