<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: LZW Data Compression</title>
	<link>http://marknelson.us/1989/10/01/lzw-data-compression/</link>
	<description>Programming, mostly.</description>
	<pubDate>Fri, 12 Mar 2010 02:30:07 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: funman</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-323507</link>
		<dc:creator>funman</dc:creator>
		<pubDate>Wed, 03 Mar 2010 15:23:34 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-323507</guid>
		<description>Hello,

I noticed your code crashes on 64 bits architecture:
In input_code() left-shifting the bit buffer doesn't clear the top 32 bits, so I just clear non significative bits in the return code

[c]
--- lzw.c.orig	2010-03-03 16:08:05.000000000 +0100
+++ lzw.c	2010-03-03 16:17:13.000000000 +0100
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define BITS 12                   /* Setting the number of bits to 12, 13*/
 #define HASHING_SHIFT (BITS-8)    /* or 14 affects several constants.    */
@@ -317,6 +318,9 @@
     input_bit_count += 8;
   }
   return_value=input_bit_buffer &#62;&#62; (32-BITS);
+#if ULONG_MAX &#62; (1
[/c]</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>I noticed your code crashes on 64 bits architecture:<br />
In input_code() left-shifting the bit buffer doesn't clear the top 32 bits, so I just clear non significative bits in the return code</p>
<div class="igBar"><span id="lc-1"><a href="#" onclick="javascript:showPlainTxt('c-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-1">
<div class="c">
<ol>
<li class="li1">
<div class="de1">--- lzw.<span class="me1">c</span>.<span class="me1">orig</span>&nbsp; <span class="nu0">2010</span>-<span class="nu0">03</span>-<span class="nu0">03</span> <span class="nu0">16</span>:<span class="nu0">08</span>:<span class="nu0">05</span>.<span class="nu0">000000000</span> +<span class="nu0">0100</span></div>
</li>
<li class="li2">
<div class="de2">+++ lzw.<span class="me1">c</span>&nbsp; &nbsp;<span class="nu0">2010</span>-<span class="nu0">03</span>-<span class="nu0">03</span> <span class="nu0">16</span>:<span class="nu0">17</span>:<span class="nu0">13</span>.<span class="nu0">000000000</span> +<span class="nu0">0100</span></div>
</li>
<li class="li1">
<div class="de1">@@ -<span class="nu0">13</span>,<span class="nu0">6</span> +<span class="nu0">13</span>,<span class="nu0">7</span> @@</div>
</li>
<li class="li2">
<div class="de2">&nbsp;<span class="co2">#include </span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="co2">#include </span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;<span class="co2">#include </span></div>
</li>
<li class="li1">
<div class="de1">+<span class="co2">#include </span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="co2">#define BITS 12&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* Setting the number of bits to 12, 13*/</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;<span class="co2">#define HASHING_SHIFT (BITS-8)&nbsp; &nbsp; /* or 14 affects several constants.&nbsp; &nbsp; */</span></div>
</li>
<li class="li1">
<div class="de1">@@ -<span class="nu0">317</span>,<span class="nu0">6</span> +<span class="nu0">318</span>,<span class="nu0">9</span> @@</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp;input_bit_count += <span class="nu0">8</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp;return_value=input_bit_buffer&amp;gt;&amp;gt; <span class="br0">&#40;</span><span class="nu0">32</span>-BITS<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">+<span class="co2">#if ULONG_MAX&amp;gt; (1 </span></div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nikos</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321790</link>
		<dc:creator>nikos</dc:creator>
		<pubDate>Sun, 24 Jan 2010 08:43:30 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321790</guid>
		<description>I wrote a small blog article about this code, including a C++ wrapper to it, that extends it for variable bit length encoding which should help improving the compression ratio regardless of source file size:
http://zabkat.com/blog/24Jan10-lzw-compression-code.htm</description>
		<content:encoded><![CDATA[<p>I wrote a small blog article about this code, including a C++ wrapper to it, that extends it for variable bit length encoding which should help improving the compression ratio regardless of source file size:<br />
<a href="http://zabkat.com/blog/24Jan10-lzw-compression-code.htm" rel="nofollow">http://zabkat.com/blog/24Jan10-lzw-compression-code.htm</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321692</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Thu, 21 Jan 2010 12:47:45 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321692</guid>
		<description>@YH:

I don't really think LZW is suitable for a static dictionary. There would be a lot of problems creating it, and what you would end up with would be something a bit different from LZW.

- Mark</description>
		<content:encoded><![CDATA[<p>@YH:</p>
<p>I don't really think LZW is suitable for a static dictionary. There would be a lot of problems creating it, and what you would end up with would be something a bit different from LZW.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: YH</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321689</link>
		<dc:creator>YH</dc:creator>
		<pubDate>Thu, 21 Jan 2010 11:13:21 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321689</guid>
		<description>Dear Mark,

I am glad to use your code to compress binary data in PC-based program and it results in great compression power. 
However I'm having a problem of not enough RAM to decompress it in a small footprint device. Therefore, I'm thinking of hard-coding the tables. From my experiment, it seems that compression and decompression make use of different tables, i.e. prefix_code, append_character. Do you think is it possible to implement fix pattern tables for both compression and decompression? 
Thank you

Regards,
YH</description>
		<content:encoded><![CDATA[<p>Dear Mark,</p>
<p>I am glad to use your code to compress binary data in PC-based program and it results in great compression power.<br />
However I'm having a problem of not enough RAM to decompress it in a small footprint device. Therefore, I'm thinking of hard-coding the tables. From my experiment, it seems that compression and decompression make use of different tables, i.e. prefix_code, append_character. Do you think is it possible to implement fix pattern tables for both compression and decompression?<br />
Thank you</p>
<p>Regards,<br />
YH</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nikos</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321513</link>
		<dc:creator>nikos</dc:creator>
		<pubDate>Mon, 18 Jan 2010 14:13:33 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-321513</guid>
		<description>thanks for this minimal and clear-cut implementation of LZW

i've been trying to adapt your code to use variable bit encoding so as to behave good for both small and large files

I've hit a but in your code which has been mentioned earlier by "jj" when you define BITS (constant) 10 or 9

to reproduce, take your code and add these 2 lines _after_ the definition of the hash table size

[c]
#define BITS 10
#define MAX_VALUE (1 &lt;&lt; BITS) - 1
[/c]

this way the hash table will be defined large enough for BITS=14 (any value is ok) then you redefine BITS to a lower value to demonstrate the bug

for certain kinds of input files, the final file test.out is DIFFERENT from the input file by a couple of bytes near the end

I am certain that the problem is your output_code() implementation that for 10 bits and lower isn't guaranteed to flush the final integer to the file (despite you writing MAX_VALUE and a zero at the end of your compress function).

It is easy to reproduce this bug if you try your code in a few files in your %TEMP% folder, I'm sure one of them will fail the test. Or if you can't find one I can send you a small test file.</description>
		<content:encoded><![CDATA[<p>thanks for this minimal and clear-cut implementation of LZW</p>
<p>i've been trying to adapt your code to use variable bit encoding so as to behave good for both small and large files</p>
<p>I've hit a but in your code which has been mentioned earlier by "jj" when you define BITS (constant) 10 or 9</p>
<p>to reproduce, take your code and add these 2 lines _after_ the definition of the hash table size</p>
<div class="igBar"><span id="lc-2"><a href="#" onclick="javascript:showPlainTxt('c-2'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-2">
<div class="c">
<ol>
<li class="li1">
<div class="de1"><span class="co2">#define BITS 10</span></div>
</li>
<li class="li2">
<div class="de2"><span class="co2">#define MAX_VALUE (1 &lt;&lt;BITS) - 1 </span></div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>this way the hash table will be defined large enough for BITS=14 (any value is ok) then you redefine BITS to a lower value to demonstrate the bug</p>
<p>for certain kinds of input files, the final file test.out is DIFFERENT from the input file by a couple of bytes near the end</p>
<p>I am certain that the problem is your output_code() implementation that for 10 bits and lower isn't guaranteed to flush the final integer to the file (despite you writing MAX_VALUE and a zero at the end of your compress function).</p>
<p>It is easy to reproduce this bug if you try your code in a few files in your %TEMP% folder, I'm sure one of them will fail the test. Or if you can't find one I can send you a small test file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320050</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Tue, 15 Dec 2009 16:34:55 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320050</guid>
		<description>Mohan - 

Sorry the program does not meet your needs. Looks like you have some work to do, better get busy.

- Mark</description>
		<content:encoded><![CDATA[<p>Mohan - </p>
<p>Sorry the program does not meet your needs. Looks like you have some work to do, better get busy.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mohan</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320049</link>
		<dc:creator>Mohan</dc:creator>
		<pubDate>Tue, 15 Dec 2009 16:26:45 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320049</guid>
		<description>Thank you Mark. I got the output. But the test.lzw is not in a readable form. Also I kindly request you for the codes of other compression algorithms like the one you have posted for lzw as its more easy to compile and execute. Possibly if I could get lzss, bwt or others it will be very helpful for my dissertation. Thanks for your immediate response.

Regards,
Mohan</description>
		<content:encoded><![CDATA[<p>Thank you Mark. I got the output. But the test.lzw is not in a readable form. Also I kindly request you for the codes of other compression algorithms like the one you have posted for lzw as its more easy to compile and execute. Possibly if I could get lzss, bwt or others it will be very helpful for my dissertation. Thanks for your immediate response.</p>
<p>Regards,<br />
Mohan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320046</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Tue, 15 Dec 2009 16:07:24 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320046</guid>
		<description>Mohan, the executable will either take input and output file names from the command line, or will ask you for them if you leave the command line empty.

That is the basic procedure.

- Mark</description>
		<content:encoded><![CDATA[<p>Mohan, the executable will either take input and output file names from the command line, or will ask you for them if you leave the command line empty.</p>
<p>That is the basic procedure.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mohan</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320044</link>
		<dc:creator>Mohan</dc:creator>
		<pubDate>Tue, 15 Dec 2009 15:58:38 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-320044</guid>
		<description>Dear Mark,

I'm doing a dissertation in evaluating the compression algorithms. When I compiled the code in dev C++ its fine. But when I execute it it returns me a file which has got nothing in it. Since I'm not very good at code development I'm not able to understand the problem. I wanted to know whether this code has any data base connectivity and what shoudl I need to do in order to compress a text file. Should I need to change anything in the code. Please let me know some basic procedures behind this execution. Thank you.

Regards,
Mohan</description>
		<content:encoded><![CDATA[<p>Dear Mark,</p>
<p>I'm doing a dissertation in evaluating the compression algorithms. When I compiled the code in dev C++ its fine. But when I execute it it returns me a file which has got nothing in it. Since I'm not very good at code development I'm not able to understand the problem. I wanted to know whether this code has any data base connectivity and what shoudl I need to do in order to compress a text file. Should I need to change anything in the code. Please let me know some basic procedures behind this execution. Thank you.</p>
<p>Regards,<br />
Mohan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-318204</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Sun, 08 Nov 2009 01:04:44 +0000</pubDate>
		<guid>http://marknelson.us/1989/10/01/lzw-data-compression/#comment-318204</guid>
		<description>@ICodeLikeAGirl:

First, your coment about wanting to compress binary and ASCII doesn't make any sense to me. The code as written doesn't distinguish between the type of data - as long as it comes in a stream of bytes it's going to compress properly.

If you want to write 12 bit chunks you can certainly just use the code in the article, which is bit-oriented. Or you can save up two 12-bit tokens and write them out as a three-byte sequence. This would be very efficient but you would have to possibly pad the file with one additional token at the end.

- Mark</description>
		<content:encoded><![CDATA[<p>@ICodeLikeAGirl:</p>
<p>First, your coment about wanting to compress binary and ASCII doesn't make any sense to me. The code as written doesn't distinguish between the type of data - as long as it comes in a stream of bytes it's going to compress properly.</p>
<p>If you want to write 12 bit chunks you can certainly just use the code in the article, which is bit-oriented. Or you can save up two 12-bit tokens and write them out as a three-byte sequence. This would be very efficient but you would have to possibly pad the file with one additional token at the end.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
</channel>
</rss>
