![]() |
![]() |
![]() |
![]() |
The Data Compression Book
2nd edition
by Mark Nelson and Jean-loup Gailly, M&T Books, New York, NY 1995
ISBN 1-55851-434-1
541 pages
List price in the US is $39.95
|
Update: I'm sorry to say this book is out of print. However, you can almost always find used copies on Amazon.com for very reasonable prices. Be sure to get the second edition!
"The best all-around book on the subject" - Andrew Schulman, Dr. Dobb's Journal "The book hits its target audience right between the eyes." Jeff Duntemann, PC Techniques "One of my favorite books on applied computer technology is The Data Compression Book" - Jeff Prosise, PC Magazine |
This authoritative guide details various data compression techniques used on personal and mid-sized computers. It explores different data compression methods, explaining the theory behind each and showing C programmers how to apply them to significantly increase the storage capacity of their system. Each technique is fully illustrated with complete, working programs written in portable C. These programs not only demonstrate how data compression works but can also be used to build your own data compression programs.
Topics include:
-
Fractal Compression
Shannon-Fano and Huffman coding.
Differences between modeling and coding.
Expanding and improving Huffman Coding with Adaptive Huffman techniques.
Arithmetic coding.
Implementing powerful statistical models.
Dictionary compression methods using LZ77 and LZ78.
Applying lossy compression techniques to computer graphics and digitized sound data.
The JPEG compression algorithm.
Developing a complete archiving program.
What's new in the second edition?
The second edition of this book was printed in November, 1995. The text and source code of the book was cleaned up somewhat to match up with current events. In addition, Jean-loup Gaillyadded a chapter on Fractal Image Compression, and performed some work on the rest of the text.
Want more information on data compression?
To keep up with the field of data compression, you should monitor the news groups: comp.compression. Before you post any questions or comments in this group, you probably ought to read the comp.compression FAQ.
Want permission to use the code in the book? Check out my liberal code use policy.
Errata
First Edition, First printing only! Page 310.
The top of this page starts with a declaration for find_child_node(). The two local variables are defined as:
-
unsigned int index;
-
int offset;
These need to be changed to:
-
unsigned int index;
-
unsigned int offset;
The code is correct on the disk, and is correct in the second printing, and the second edition.
First Edition, Page 121
Second Edition, Page 112
The listing for ahuff.c was truncated, resulting in the loss of a fairly big chunk of code. The listing on the diskette is complete, so most people won't miss it. Those who bought the first edition sans disk can get the updated copy here.
Second Edition only, Page 50
The listing for main-c.c did not include the terminating angle bracket ( '}' )character. The code is correct on the diskette.
Second Edition Ony, Page 332
A nasty transcription error rendered the formula shown in Figure 11.8 invalid. The bottom half of the formula shows a term that gets the cosine of ((2j+1)*j*π)/2N. This is incorrect, it should take the cosine of ((2j+1)*i*π)/2N. Note that the change means that we multiply π by i, not j.
The code and supporting documentation are correct as printed.




25 users commented in " The Data Compression Book "
Follow-up comment rss or Leave a Trackback[...] The Data Compression Book [...]
I need to hide 4 digit binary in 8 digit binary. Can it be done?.
@Alex:
Hi Alex,
I'm not sure I know what you mean by "Hide".
Hi Mr. Mark Nelson,
Finally had a chance to buy your book. Was used, but in very good state. Only wnated to know if you have the contents somewhere in this site, to download and study it, if possible...
Regards,
Sven Bleckwedel
Hi Mr Mark Nelson,
I forgot to mention that about the contents I refer to the diskette contents (archived in one .zip) that could be available in your site...
Regards,
Sven Bleckwedel
hi Mr. Mark Nelson
this about book "The Data Compression Book" errata in Chapter 12 "An Archiving Package". In carman.c source code
i have compile it in Borland C++ 5.5.1 for Win32
and i have got several error that make me confuse like :
Error E2129 carman2.c 228: Character constant must be one or two characters long in function main
- Maybe '' is String so "", are i'am right
Error E2451 carman2.c 1149: Undefined symbol 'OL' in function Insert
- OL is not define variable
Error in line 1045 which number_of_bytes-- > ? bigger from what
and at last i try to fix but failed with error in linking
Error: Unresolved external '_CalculatedBlockCRC32' referenced from C:\BORLAND\BC
C55\BIN\CARMAN.OBJ
i so appricate if you want give me the fix source code...
best regard
@evilkyro:
It looks like you must be typing in the source code by hand, right? If so I can help you out with a copy of the source in a zip file. Let me know if you are interested.
yes, sure think mark i have type it. give me copy of the source in zip file very helpfull to me. i so appreciate. thank you. by the way where the link that i can download it mark. and again thanx
best regard
and i want to ask where can i find all the source code of your book Mr. Mark. cause, i have search but can find it. and this book i borrow from library and i live in indonesia. so i hard to find this book. can you give me the link. so i can download it. i'm so appreciate if you give me all source in your book.
best regard
Thank Your for send me the souce. It will be very usefull to learn compression.
best regard
I have the 1st edition I read years ago and have to review it again because I have some data compression work to do. The book is a great help to my understanding.
However, Appendix A (Compression Statistics) gives some numbers without describing their units. There are some 4 digit numbers of the form NN.NN. I assume they are compression ratio percentages, but perhaps they are execution time in seconds! If they are percentages, they might be compressed size / original size or (original size - compressed size) / original size. Also, there are numbers listed between 3 digits and 5 digits long in other tables, are these bytes/second, total seconds, or something else? Maybe someone that understands could respond or an entry could be made under Errata.
@Jacques:
Note that the head of the first table says "Ratios" - the numbers you are seeing are compression ratios. The ratio used is described in the front of the book in the section "Keeping Score". A value of 100 indicates perfect compression, and 0 indicates no compression.
dear mr.Mark nelson
now i'm doing my final exam about lossless compression using context tree modeling. i have download several references, but until now i'm still confuse about context modeling and context tree algorithm. can you give me suggestion, a book or an article about this algorithm? thank you
@sophie:
Actually, I don't have anything in book about Context Trees or the Context Tree Algorithm - because I don't know anything about them. Sorry.
Post to comp.compression and ask for references, I'm sure the good people there will be happy to give you a pointer or two.
- Mark
I just wanna see how iG:Syntax Hiliter works:
Mark-
Love the book. It's been on my bookshelf since '92.
I was referring to it this morning, and thought I'd pass along some errata, in case you plan to come out with a revised edition sometime soon.
These are pertaining to the first edition.
Page 125, in your table, all 18 of your "greater than" signs should be "less than signs". Example,
SPACE 1/10 0.00 >= r > 0.10
(no value of r can be both less than 0 and greater than 0.1)
Page 234, second last paragraph ends with:
So this token is encoded as: 14, 4, ' .' - which should be
So this token is encoded as: 14, 4, ' '.
(I'm sure some English-expert editor (or program) incorrectly put the period inside the quote for you. If you want to satisfy the editor, perhaps put it inside of double quotes, like you did on page 235:
So this token is encoded as: "14, 4, ' '."
I found a few more minor things... not sure if you want them. Very helpful book, though! Thanks!
@J:
Thanks for the corrections - the first edition was done in the dark ages of publishing - believe it or not I had to send in hardcopy, so it had more than it's share of problems, particularly in the code.
- Mark
Hello Mark
I have read your book "The Data Compression" and just have one question about 'The Arithmatic code method', I can't find an example of decoding the example message "BILL GATE" by shifting the MSB from low and High values. I will appreciate that if you can give me the decoding example as I understood the encoding but can't understood the decoding from the C code.
Best Wishes
Yaser Hosaney
Egypt
@Yasr:
I've given the best explanation I know how in the book. It's possible you might get a better explanation here, but it's mostly the same text as the book:
http://dogma.net/markn/articles/arith/part1.htm
You could try the wikipedia article also:
http://en.wikipedia.org/wiki/Arithmetic_coding
I recommend actually running the code in a debugger and stepping through it for better results.
Not sure where I came across this book originally, maybe a reference to it in jpeg.org.
It is a pretty good book but is there any chance of an updated version of it coming out any time soon ?
I am especially interested in video compression and need a more recent book urgently !
@donnyab:
I would really love to update the book, but I don't have the time in my schedule in the near future.
- Mark
Hi,
i need shannon fano data compression code for compressing file.
I need to write a report on it.can anybody help me ASAP.
THANKS
[/c]
if ( nodes[ i ].count != 0)
{
if ( nodes[ i ].count < nodes[ min_1 ].count)
{
min_2 + min_1;
min_1 = i;
}
else if ( nodes[ i ].count < nodes[ min_2 ].count)
min_2 = i;
}
[/c]
What is meant by min_2 + min_1,please?Shouldn´t it be min_2 = min_2 + min_1?Code is from function build_tree from The Huffman algorithm chapter...
Thank you for answer
@johny:
There were many typos in the book - back in they days when we printed long listings like this, it was tough on the proofreaders - they didn't understand C. Not only that, but the people who typeset the book had to take my code for printouts and type it in by hand - and they didn't know C either.
The disk that accompanied the book had correct code, but if you are looking at the book right now it is very unlikely that you have a copy of the diskette.
The line in question should be:
min_2 = min_1;
Somehow that = sign was converted to a plus sign.
Thanks for reading!
- Mark
Dear Sir,
Thanks for the unbelievably simple and straight approach in the book. no other book in data compression gives ideas without support of mathematical explanations. Feels like the author talking to me when I read it. Thanks, thanks a lot.
I've tried your code for fractal compression, I'm getting it working, but there are problems with the decoded file. What shall I do to get pixel values from the decoded file?
I'm working on some different schemes for domain classification.
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]