The Data Compression Book

2nd edition

by Mark Nelson and Jean-loup Gailly, M&T Books, New York, NY 1995
ISBN 1-55851-434-1
541 pages
List price in the US is $39.95

Update: I'm sorry to say this book is out of print. However, you can almost always find used copies on Amazon.com for very reasonable prices. Be sure to get the second edition!

"The best all-around book on the subject" - Andrew Schulman, Dr. Dobb's Journal

"The book hits its target audience right between the eyes." Jeff Duntemann, PC Techniques

"One of my favorite books on applied computer technology is The Data Compression Book" - Jeff Prosise, PC Magazine

This authoritative guide details various data compression techniques used on personal and mid-sized computers. It explores different data compression methods, explaining the theory behind each and showing C programmers how to apply them to significantly increase the storage capacity of their system. Each technique is fully illustrated with complete, working programs written in portable C. These programs not only demonstrate how data compression works but can also be used to build your own data compression programs.

Topics include:

  • Fractal Compression
  • Shannon-Fano and Huffman coding.
  • Differences between modeling and coding.
  • Expanding and improving Huffman Coding with Adaptive Huffman techniques.
  • Arithmetic coding.
  • Implementing powerful statistical models.
  • Dictionary compression methods using LZ77 and LZ78.
  • Applying lossy compression techniques to computer graphics and digitized sound data.
  • The JPEG compression algorithm.
  • Developing a complete archiving program.

What's new in the second edition?

The second edition of this book was printed in November, 1995. The text and source code of the book was cleaned up somewhat to match up with current events. In addition, Jean-loup Gaillyadded a chapter on Fractal Image Compression, and performed some work on the rest of the text.

Want more information on data compression?

To keep up with the field of data compression, you should monitor the news groups: comp.compression. Before you post any questions or comments in this group, you probably ought to read the comp.compression FAQ.

Want permission to use the code in the book? Check out my liberal code use policy.

Errata

First Edition, First printing only! Page 310.

The top of this page starts with a declaration for find_child_node(). The two local variables are defined as:

C++:
  1. unsigned int index;
  2. int offset;

These need to be changed to:

C++:
  1. unsigned int index;
  2. unsigned int offset;

The code is correct on the disk, and is correct in the second printing, and the second edition.


First Edition, Page 121
Second Edition, Page 112

The listing for ahuff.c was truncated, resulting in the loss of a fairly big chunk of code. The listing on the diskette is complete, so most people won't miss it. Those who bought the first edition sans disk can get the updated copy here.


Second Edition only, Page 50

The listing for main-c.c did not include the terminating angle bracket ( '}' )character. The code is correct on the diskette.


Second Edition Ony, Page 332
A nasty transcription error rendered the formula shown in Figure 11.8 invalid. The bottom half of the formula shows a term that gets the cosine of ((2j+1)*j*π)/2N. This is incorrect, it should take the cosine of ((2j+1)*i*π)/2N. Note that the change means that we multiply π by i, not j.

The code and supporting documentation are correct as printed.