The Data Compression Book 2nd edition
by Mark Nelson and Jean-loup Gailly
M&T Books, New York, NY 19950
ISBN 1-55851-434-1
541 pages
List price in the US is $39.95

The best all-around book on the subject - Andrew Schulman, Dr. Dobb's Journal

The book hits its target audience right between the eyes. Jeff Duntemann, PC Techniques

One of my favorite books on applied computer technology is The Data Compression Book - Jeff Prosise, PC Magazine


I’m sorry to say this book is long out of print. The first edition is probably very difficult to find, but the second edition is widely available for resale.

This authoritative guide details various data compression techniques used on personal and mid-sized computers. It explores different data compression methods, explaining the theory behind each and showing C programmers how to apply them to significantly increase the storage capacity of their system. Each technique is fully illustrated with complete, working programs written in portable C. These programs not only demonstrate how data compression works but can also be used to build your own data compression programs.

Topics include:

  • Fractal Compression
  • Shannon-Fano and Huffman coding.
  • Differences between modeling and coding.
  • Expanding and improving Huffman Coding with Adaptive Huffman techniques.
  • Arithmetic coding.
  • Implementing powerful statistical models.
  • Dictionary compression methods using LZ77 and LZ78.
  • Applying lossy compression techniques to computer graphics and digitized sound data.
  • The JPEG compression algorithm.
  • Developing a complete archiving program.

What’s new in the second edition?

The second edition of this book was printed in November, 1995. The text and source code of the book was cleaned up somewhat to match up with current events. In addition, Jean-loup Gailly added a chapter on Fractal Image Compression, and performed some work on the rest of the text.


First edition, first printing only, page 310

The top of this page starts with a declaration for find_child_node(). The two local variables are defined as:

unsigned int index;
int offset;

These need to be changed to:

unsigned int index;
unsigned int offset;

The code is correct on the disk, in the second printing, and in the second edition.

First Edition, page 121/Second edition, page 112

The listing for ahuff.c was truncated, resulting in the loss of a fairly big chunk of code. The listing on the diskette is complete, so most people won’t miss it. Those who bought the first edition sans disk can get the updated copy here.

Second edition only, Page 50

The listing for main-c.c did not include the terminating angle bracket ( ‘}’ )character. The code is correct on the diskette.

Second edition only, Page 332

A nasty transcription error rendered the formula shown in Figure 11.8 invalid. The bottom half of the formula shows a term that gets the cosine of ((2j+1)*j*n)/2N. This is incorrect, it should take the cosine of ((2j+1)*i*n)/2N. Note that the change means that we multiply n by i, not j.

The code and supporting documentation are correct as printed.