<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data Compression with the Burrows-Wheeler Transform</title>
	<atom:link href="http://marknelson.us/1996/09/01/bwt/feed/" rel="self" type="application/rss+xml" />
	<link>http://marknelson.us/1996/09/01/bwt/</link>
	<description>Programming, mostly.</description>
	<lastBuildDate>Mon, 30 Jan 2012 17:56:19 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: wsu123</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-405789</link>
		<dc:creator>wsu123</dc:creator>
		<pubDate>Thu, 24 Nov 2011 19:07:12 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-405789</guid>
		<description>Can anyone help me find the links for BWT Compression Algorithm code in MATLAB??</description>
		<content:encoded><![CDATA[<p>Can anyone help me find the links for BWT Compression Algorithm code in MATLAB??</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: speedy</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-374053</link>
		<dc:creator>speedy</dc:creator>
		<pubDate>Tue, 06 Sep 2011 12:35:10 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-374053</guid>
		<description>I AM TESTING BWT WITH MTF WHICH USES THE LOCALITY OF THE ALPH-BETICAL SET OF THE LAST COLUMN. THEREFORE bwt-mtf -&gt; bwt-mtf MAY PRODUCE BETTER RESULTS. THEN lzw OR ARITHMETIC OR HUFFMAN IN THE FINAL!</description>
		<content:encoded><![CDATA[<p>I AM TESTING BWT WITH MTF WHICH USES THE LOCALITY OF THE ALPH-BETICAL SET OF THE LAST COLUMN. THEREFORE bwt-mtf -&gt; bwt-mtf MAY PRODUCE BETTER RESULTS. THEN lzw OR ARITHMETIC OR HUFFMAN IN THE FINAL!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ericke</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-370833</link>
		<dc:creator>Ericke</dc:creator>
		<pubDate>Mon, 08 Aug 2011 04:32:25 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-370833</guid>
		<description>Damn.....

Where the hell is my post about the link that i suggest...


Okay, it doesn&#039;t matter mark...!!!

lol....</description>
		<content:encoded><![CDATA[<p>Damn&#8230;..</p>
<p>Where the hell is my post about the link that i suggest&#8230;</p>
<p>Okay, it doesn&#8217;t matter mark&#8230;!!!</p>
<p>lol&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SPEEDY</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-368642</link>
		<dc:creator>SPEEDY</dc:creator>
		<pubDate>Mon, 18 Jul 2011 17:33:55 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-368642</guid>
		<description>After reading the posts here, I think BWT can be understood in a much simpler way as follows:

1. the first sorted column is embedded in all columns
2. the adjacent relationships in rows are preserved in any permutations of the rows

3 two adjacent columns and an index to the original row are enough to decode. The last and first column are adjacent in the cyclic view.

4. sorting speed is important  in the encoding</description>
		<content:encoded><![CDATA[<p>After reading the posts here, I think BWT can be understood in a much simpler way as follows:</p>
<p>1. the first sorted column is embedded in all columns<br />
2. the adjacent relationships in rows are preserved in any permutations of the rows</p>
<p>3 two adjacent columns and an index to the original row are enough to decode. The last and first column are adjacent in the cyclic view.</p>
<p>4. sorting speed is important  in the encoding</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-344585</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Mon, 17 Jan 2011 15:18:05 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-344585</guid>
		<description>@Shoaib:

Google is your friend.</description>
		<content:encoded><![CDATA[<p>@Shoaib:</p>
<p>Google is your friend.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shoaib</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-344366</link>
		<dc:creator>Shoaib</dc:creator>
		<pubDate>Sun, 16 Jan 2011 16:17:24 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-344366</guid>
		<description>Or anyone who has doen it please let me know... would be of great help.</description>
		<content:encoded><![CDATA[<p>Or anyone who has doen it please let me know&#8230; would be of great help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shoaib</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-344365</link>
		<dc:creator>Shoaib</dc:creator>
		<pubDate>Sun, 16 Jan 2011 16:16:43 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-344365</guid>
		<description>HI mark could you put up a java implementation?</description>
		<content:encoded><![CDATA[<p>HI mark could you put up a java implementation?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philips Telaumbanua</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-337464</link>
		<dc:creator>Philips Telaumbanua</dc:creator>
		<pubDate>Mon, 22 Nov 2010 04:09:45 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-337464</guid>
		<description>Thank you Mark for response again!

Yeah, I just realized that the EOF character should be less than every characters in the block. So, it would reduce the comparison. In other words it wouldn&#039;t continue the comparison if x or y = N (x or y = EOF), since EOF character always &lt; other characters so the comparison can be terminated if one of them reached EOF. A good example is when we&#039;d like to transform string &quot;aaaaa$&quot;.

I think I&#039;ve got the point.

Thank you very much Mark!

Ooo,,,I just want to tell you that your articles is very great, good written, neat, and clear. (article + source code = perfect). It has been a huge help.

Thank again Mark...

- Philips Tel</description>
		<content:encoded><![CDATA[<p>Thank you Mark for response again!</p>
<p>Yeah, I just realized that the EOF character should be less than every characters in the block. So, it would reduce the comparison. In other words it wouldn&#8217;t continue the comparison if x or y = N (x or y = EOF), since EOF character always &lt; other characters so the comparison can be terminated if one of them reached EOF. A good example is when we&#039;d like to transform string &quot;aaaaa$&quot;.</p>
<p>I think I&#039;ve got the point.</p>
<p>Thank you very much Mark!</p>
<p>Ooo,,,I just want to tell you that your articles is very great, good written, neat, and clear. (article + source code = perfect). It has been a huge help.</p>
<p>Thank again Mark&#8230;</p>
<p>- Philips Tel</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-337440</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Sun, 21 Nov 2010 20:49:29 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-337440</guid>
		<description>There is a big advantage to placing an EOF character at the end of the string. If the character is not found anywhere else in the input text, it means that you can use normal string compares to sort the strings - so instead of checking to see if the index wraps around at N, you can be guaranteed that the two strings won&#039;t match when one index reaches N.

That speeds things up a lot, and it means you can use a native sort, which might even be faster.

The only disadvantage is that you have to take the EOF characters out of the block when you decompress. No problem.

I&#039;d suggest continued study of the Burrows Wheeler suggestions for improving the sort speed, there is a lot of good in there.
- Mark</description>
		<content:encoded><![CDATA[<p>There is a big advantage to placing an EOF character at the end of the string. If the character is not found anywhere else in the input text, it means that you can use normal string compares to sort the strings &#8211; so instead of checking to see if the index wraps around at N, you can be guaranteed that the two strings won&#8217;t match when one index reaches N.</p>
<p>That speeds things up a lot, and it means you can use a native sort, which might even be faster.</p>
<p>The only disadvantage is that you have to take the EOF characters out of the block when you decompress. No problem.</p>
<p>I&#8217;d suggest continued study of the Burrows Wheeler suggestions for improving the sort speed, there is a lot of good in there.<br />
- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philips Telaumbanua</title>
		<link>http://marknelson.us/1996/09/01/bwt/comment-page-2/#comment-337397</link>
		<dc:creator>Philips Telaumbanua</dc:creator>
		<pubDate>Sun, 21 Nov 2010 08:14:42 +0000</pubDate>
		<guid isPermaLink="false">/1996/09/01/bwt/#comment-337397</guid>
		<description>Hi Mark,
Thank for fast response!

I&#039;m using Delphi. And my comparePresorted function look like this:
[c]
function comparePresorted(myString, x, y) {
   n = length(myString); /* the length of string */
   i = 0;

   /** continue compare string a byte at a time until 
    *  there&#039;s the difference or length of string is reached 
    */
   while (myString[x] = buffer[myString]) &amp;&amp; ( i != n) {
      x++;
      y++;
      if( x &gt;= n ) {
          x = 0;
      } if( y &gt;= n ) {
          y = 0;
      }
      i++;
   } // end while

   if( myString[x] &lt; myString[y]){
      return TRUE;
   } else {
      return FALSE;
   }
}
[/c]
That function works by comparing the string a byte at a time based on the starting point that passed in x and y which means &quot;if character in position[x] &lt; character in position[y] would return TRUE, otherwise FALSE&quot;;

example : string &quot;BANANA&quot; (characters position from 0,1,3..5) 

if we call comparePresorted(&quot;BANANA&quot;, 2, 4), it would compare strings from position 2,3,4,5,0,1 and 4,5,0,1,2,3.
In this case the comparePresorted will stop the comparison when x = 4 and y = 0 (since position 4 is &#039;N&#039; and 0 is &#039;B&#039; and they are difference)

In the original MergeSort algorithm, it compares two numbers to sort the sequence of numbers. It something like this:
[c]
if number1 &lt; number 2 {
   // some MergeSort&#039;s codes go here
} else {
   // some MergeSort&#039;;s codes go here
}
[/c]
Then I modified it by passing the comparePresorted() function in it. So, it would something like this:
[c]
if comparePresorted(&quot;BANANA&quot;, 2, 4) {
  // some MergeSort&#039;s codes go here
} else {
  // some MergeSort&#039;;s codes go here
}
[/c]
After it, we would get a sorted indicies.

But, as I said it just works for file  100 kb (very slow)

Is it good statistic Mark for BWT?
I don&#039;t know how improve it, but based on Burrows&#039; paper, he suggest by using an EOF ($). So, our previous &quot;BANANA&quot; would &quot;BANANA$&quot;. But I don&#039;t know what is the benefit of EOF &#039;$&#039; in sorting the rotated strings.

Would you tell me mark what&#039;s actually the using of EOF $ ???
It&#039;s really confused.

Thank in Advance
- Philips Telaumbanua</description>
		<content:encoded><![CDATA[<p>Hi Mark,<br />
Thank for fast response!</p>
<p>I'm using Delphi. And my comparePresorted function look like this:</p>
<div class="igBar"><span id="lc-1"><a href="#" onclick="javascript:showPlainTxt('c-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-1">
<div class="c">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">function</span> comparePresorted<span class="br0">&#40;</span>myString, x, y<span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp;n = length<span class="br0">&#40;</span>myString<span class="br0">&#41;</span>; <span class="coMULTI">/* the length of string */</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;i = <span class="nu0">0</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="coMULTI">/** continue compare string a byte at a time until </span></div>
</li>
<li class="li2">
<div class="de2"><span class="coMULTI">&nbsp; &nbsp; *&nbsp; there's the difference or length of string is reached </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; */</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp;<span class="kw1">while</span> <span class="br0">&#40;</span>myString<span class="br0">&#91;</span>x<span class="br0">&#93;</span> = buffer<span class="br0">&#91;</span>myString<span class="br0">&#93;</span><span class="br0">&#41;</span> &amp;&amp; <span class="br0">&#40;</span> i != n<span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; x++;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; y++;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="kw1">if</span><span class="br0">&#40;</span> x&gt;= n <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x = <span class="nu0">0</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span> <span class="kw1">if</span><span class="br0">&#40;</span> y&gt;= n <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y = <span class="nu0">0</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; i++;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span> <span class="co1">// end while</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">if</span><span class="br0">&#40;</span> myString<span class="br0">&#91;</span>x<span class="br0">&#93;</span> &lt;myString<span class="br0">&#91;</span>y<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">TRUE</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span> <span class="kw1">else</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">FALSE</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
That function works by comparing the string a byte at a time based on the starting point that passed in x and y which means &quot;if character in position[x] &lt; character in position[y] would return TRUE, otherwise FALSE&quot;;</p>
<p>example : string &quot;BANANA&quot; (characters position from 0,1,3..5) </p>
<p>if we call comparePresorted(&quot;BANANA&quot;, 2, 4), it would compare strings from position 2,3,4,5,0,1 and 4,5,0,1,2,3.<br />
In this case the comparePresorted will stop the comparison when x = 4 and y = 0 (since position 4 is &#039;N&#039; and 0 is &#039;B&#039; and they are difference)</p>
<p>In the original MergeSort algorithm, it compares two numbers to sort the sequence of numbers. It something like this:</p>
<div class="igBar"><span id="lc-2"><a href="#" onclick="javascript:showPlainTxt('c-2'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-2">
<div class="c">
<ol>
<li class="li1">
<div class="de1"><span class="kw1">if</span> number1 &amp;lt;number <span class="nu0">2</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp;<span class="co1">// some MergeSort's codes go here</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> <span class="kw1">else</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp;<span class="co1">// some MergeSort';s codes go here</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
Then I modified it by passing the comparePresorted() function in it. So, it would something like this:</p>
<div class="igBar"><span id="lc-3"><a href="#" onclick="javascript:showPlainTxt('c-3'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-3">
<div class="c">
<ol>
<li class="li1">
<div class="de1"><span class="kw1">if</span> comparePresorted<span class="br0">&#40;</span><span class="st0">"BANANA"</span>, <span class="nu0">2</span>, <span class="nu0">4</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="co1">// some MergeSort's codes go here</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> <span class="kw1">else</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="co1">// some MergeSort';s codes go here</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
After it, we would get a sorted indicies.</p>
<p>But, as I said it just works for file  100 kb (very slow)</p>
<p>Is it good statistic Mark for BWT?<br />
I don't know how improve it, but based on Burrows' paper, he suggest by using an EOF ($). So, our previous "BANANA" would "BANANA$". But I don't know what is the benefit of EOF '$' in sorting the rotated strings.</p>
<p>Would you tell me mark what's actually the using of EOF $ ???<br />
It's really confused.</p>
<p>Thank in Advance<br />
- Philips Telaumbanua</p>
]]></content:encoded>
	</item>
</channel>
</rss>

