Sapir-Whorf to Dijkstra to Torvalds - Language Bigotry In Our Time

Back in the day the Sapir-Whorf hypothesis was all the rage in the study of linguistics. With apologies to those who actually work in the field, I’ll crudely summarize it as the idea that the language you speak both constrains and influences how you think. The idea says that if your language only has one word for snow, for example, you will actually have a hard time seeing any difference between light powder and crunchy ice pack.

Sapir-Whorf was seen as completely discredited back when I learned about it, and while Linguistic Relativity has enjoyed a slight comeback with a weakly restated set of hypotheses, it seems fairly certain that human thought is by no means confined to a cage built out of vocabulary and grammar.

Our field has long had its own Sapir in E.W. Dijkstra, who spelled it out with money quotes like these:

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence.

Closer to today, we have the famous rant against C++ from Linus Torvalds, who feels that a programmer who uses C++ is going to wreck any project he or she touches:

I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with.
C++ leads to really really bad design choices.

Dogmatism a la Torvalds

At the large corporate entity that pays my bills, we have a slogan on the back of our badges: No Technology Religion. It might not be easy to live up to this, but yes, I try.

To me, this admonishment means two things:

Try to objectively choose the best tool for the job
Don't let your preferences in tools dictate the way the job should be done

Linus is clearly saying in his rant that anyone who programs in C++ is guilty of breaking both of these rules.

I disagree. I think there are times when C++ is clearly the right tool for the job, and that you can arrive at this conclusion fairly objectively. I think Linus is clearly fogged in by his particular Technology Religion.

A Simple Example

As the final assignment for my C/C++ programming class last semester, I asked my students to implement a simple token counting program in C. The goal was to reproduce the behavior given by this C++ fragment:

map<string,int> counts;
string s;
while ( cin >> s )
    counts[s]++;
for ( auto ii = counts.begin() ; ii != counts.end() ; ii++ ) 
    cout << ii->second << " : " << ii->first << endl;

This particular program highlights a number of features of C++ that are not present in C:

The versatile replacement for C arrays, vector<T> .
The string class.
Safe input using iostreams.
Associative arrays as part of the standard library.

This program is quite easy to write in C++, and is basically complete. One could flesh it out a bit with of error handling on the input stream, but that’s really not even necessary.

Do it in C

Rewriting this in C is a straightforward task with one big speed bump: the lack of any sort of associative array in the library. There are a number of ways to deal with this - I chose the following strategy:

Read all the tokens into an array.
Upon completion, sort the array
Once the array is sorted, walk through it to get the count for each token.

While this algorithm uses more space than the C++ program, it probably takes up the same amount of time, assuming you don’t bump into one of the pathologically bad cases of qsort().

Since I asserted that this program is easier to write in C++ than C, it behooves me to give a list of reasons why.

C I/O deficiencies. Reading strings in C is considerably more difficult due to the fact that the C I/O library doesn't have a standard way to read strings of unbounded length. (Compiler-specific extensions can be used, but that raises other problems.) Your input code has to do a lot of checking for error cases, or you have to build your own string input functions.
Memory management of C arrays is a very manual task. I have to allocate the original space for my array, take care that I reallocate if I exceed its length, and free the space when I am done.
Memory management of C strings has exactly the same probelms.
Sorting the array of strings is just a tiny bit more inconvenient with qsort(), and qsort() doesn't give me the performance guarantees of the sort() function in the C++ library.

The C version of my function has more lines of code, and more bookkeeping tasks that need to be done manually. There are more opportunities to make mistakes.

A final reason I like the C++ version of the program better is that it lends itself well to working with other types. Any type that has insertion and extraction operators, and a comparison operator, can use that same code with just one declaration change. Turning it into a function template accomplishes the same thing with no code changes needed at all.

Some of My Best Friends are C Programmers

So am I a language bigot for preferring the C++ version of this code?

I hope not. For one thing, I can see that the C version of the program has some nice advantages:

You can write this program using POSIX system calls for almost everything except memory allocation and sorting, resulting in an extremely small footprint.
The C version of the program will be faster due to the use of low-level I/O. C++ iostreams get better all the time, but their layered approach will always be at a disadvantage when it comes to efficiency.

So for a program like this, the choice of language really comes down to context. If you believe in the 80/20 rule, you might think this code should be written in C++ if it is outside of the expensive core part of your program. With fewer lines of code you have fewer chances for error, and efficiency is probably not a big consideration.

If this is in a critical section of code that is executed frequently, you might decide C is your best choice. Make sure to put a little extra time into code review to ensure that the code is free of memory leaks and pointer errors, and you are in business.

Sic Temper Linus

So how does Linus’s rant hold up when looking at the C++ code shown at the top of the post? I would venture a guess that given the assignment, any decent C++ programmer would produce code similar to this. Linus says:

You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny

In this program I make good use of the standard library components that were once part of the STL. Having been part of the standard for over a decade, they work really well and have no portability or correctness issues in any compiler I am aware of. Saying that components like map and vector are problematic is just wrong.

inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

Well, despite the fact that C++ has made it impossible for me to do so, I managed to write the program without using any abstractions - no new classes, no interfaces. Basically just straight procedural C code that happens to employ a few useful classes.

And at least with the people I work with, I think this is the rule rather than the exception.

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.

When C has container classes, a string class, typesafe I/O, and the programmer’s gift from the gods, RAII, then this statement will be true. For now, it is bollocks.

Before modern C++ was available, I probably would have stuck with a simple pipeline to accomplish this task:

tr [:blank:] '\n'  | grep -v "^$" | sort | uniq -c

The fact that I can do the same thing just as easily in a compiled language gives me some flexiblity. I think I can appreciate that fact without being a bigot.

Can you?