One of the first articles I wrote for Dr. Dobb’s Journal, LZW Data Compression, turned out to be very popular, and still generates a fair amount of traffic and email over twenty years later.
One of the reasons for its popularity seems to be that LZW compression is a popular homework assignment for CS students around the world. And that audience sometimes found the article to be bit of a struggle. My code was modeled on the UNIX compress program, which was written in terse C for maximum efficiency. And sometimes optimization comes at the expense of comprehension.
By using C++ data structures I can model the algorithm in a much more straightforward way – the language doesn’t get in the way of a clear implementation. And after 20 years of answering puzzled queries I think I can improve on the overall explanation of just how LZW works.
In this updated look at LZW, I will first give a description of how LZW works, then describe the core C++ code that I use to implement the algorithm. I’ll then walk you through the use of the algorithm with a few varieties of I/O. Finally, I’ll show you some benchmarks.
Continue reading this post…