<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Fast String Searching With Suffix Trees</title>
	<atom:link href="http://marknelson.us/1996/08/01/suffix-trees/feed/" rel="self" type="application/rss+xml" />
	<link>http://marknelson.us/1996/08/01/suffix-trees/</link>
	<description>Programming, mostly.</description>
	<lastBuildDate>Wed, 16 May 2012 08:30:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: to do or learn &#124; Pearltrees</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-523215</link>
		<dc:creator>to do or learn &#124; Pearltrees</dc:creator>
		<pubDate>Fri, 20 Apr 2012 08:39:00 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-523215</guid>
		<description>[...] Even you can make a tree McCreight&#039;s original algorithm for constructing a suffix tree had a few disadvantages. Principle among them was the requirement that the tree be built in reverse order, meaning characters were added from the end of the input. This ruled the algorithm out for on line processing, making it much more difficult to use for applications such as data compression. In fact, the reduction in the number of nodes is such that the time and space requirements for constructing a suffix tree are reduced from O(N 2 ) to O(N). In the worst case, a suffix tree can be built with a maximum of 2N nodes, where N is the length of the input text. Fast String Searching With Suffix Trees [...]</description>
		<content:encoded><![CDATA[<p>[...] Even you can make a tree McCreight&#039;s original algorithm for constructing a suffix tree had a few disadvantages. Principle among them was the requirement that the tree be built in reverse order, meaning characters were added from the end of the input. This ruled the algorithm out for on line processing, making it much more difficult to use for applications such as data compression. In fact, the reduction in the number of nodes is such that the time and space requirements for constructing a suffix tree are reduced from O(N 2 ) to O(N). In the worst case, a suffix tree can be built with a maximum of 2N nodes, where N is the length of the input text. Fast String Searching With Suffix Trees [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: suffix tree &#171; demonstrate 的 blog</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-466408</link>
		<dc:creator>suffix tree &#171; demonstrate 的 blog</dc:creator>
		<pubDate>Sun, 29 Jan 2012 06:38:00 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-466408</guid>
		<description>[...] suffix tree，但是没见到后文了。网上能找到的一些实现如比较早的，C 的版本感觉局限性比较大（libstree 与这个），有两个 C++ [...]</description>
		<content:encoded><![CDATA[<p>[...] suffix tree，但是没见到后文了。网上能找到的一些实现如比较早的，C 的版本感觉局限性比较大（libstree 与这个），有两个 C++ [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sergey Makarenko</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-461192</link>
		<dc:creator>Sergey Makarenko</dc:creator>
		<pubDate>Wed, 18 Jan 2012 06:32:53 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-461192</guid>
		<description>Mark,
Thanks for a great article.
I believe that in streed2006.cpp signature of
[c]
 ostream &amp;operator&lt;&lt;( ostream &amp;s, Aux const&amp;a )
[/c]
 should include const for the second parameter. Otherwise numbers are outputted instead of characters. (At least for MS Visual Studio 2010).</description>
		<content:encoded><![CDATA[<p>Mark,<br />
Thanks for a great article.<br />
I believe that in streed2006.cpp signature of</p>
<div class="igBar"><span id="lc-1"><a href="#" onclick="javascript:showPlainTxt('c-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-1">
<div class="c">
<ol>
<li class="li1">
<div class="de1">ostream &amp;amp;operator&amp;lt;&amp;lt;<span class="br0">&#40;</span> ostream &amp;amp;s, Aux const&amp;amp;a <span class="br0">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
 should include const for the second parameter. Otherwise numbers are outputted instead of characters. (At least for MS Visual Studio 2010).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Young</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-446821</link>
		<dc:creator>Jason Young</dc:creator>
		<pubDate>Thu, 05 Jan 2012 05:24:54 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-446821</guid>
		<description>Mark,

Thank you for the code and kind explanation.

I think I can&#039;t fully understand the concept of the suffix tree..


In your code, what is the role of origin_node of Suffix?

thank you.

Jason.</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>Thank you for the code and kind explanation.</p>
<p>I think I can't fully understand the concept of the suffix tree..</p>
<p>In your code, what is the role of origin_node of Suffix?</p>
<p>thank you.</p>
<p>Jason.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Garret Wilson</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-422820</link>
		<dc:creator>Garret Wilson</dc:creator>
		<pubDate>Thu, 15 Dec 2011 19:50:02 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-422820</guid>
		<description>(In my comment above, I of course meant &quot;suffix tree&quot; instead of &quot;syntax tree&quot;.)

I have posted an overview of suffix trees and their application, along with links to my implementation in Java:

http://www.garretwilson.com/blog/2011/12/15/suffix-trees-java.xhtml

I referenced your article. Thanks again.

Garret</description>
		<content:encoded><![CDATA[<p>(In my comment above, I of course meant "suffix tree" instead of "syntax tree".)</p>
<p>I have posted an overview of suffix trees and their application, along with links to my implementation in Java:</p>
<p><a href="http://www.garretwilson.com/blog/2011/12/15/suffix-trees-java.xhtml" rel="nofollow">http://www.garretwilson.com/blog/2011/12/15/suffix-trees-java.xhtml</a></p>
<p>I referenced your article. Thanks again.</p>
<p>Garret</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-421553</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Tue, 13 Dec 2011 21:03:15 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-421553</guid>
		<description>@Garret:

&gt;Again, these observations are less criticisms of your approach 

No worries, as the years go by I always see plenty of room for improvement, and I would really like to rewrite this article someday. I finally did a rewrite of an LZW article after over 20 years, and I was happy with that - maybe this one is next!

Thanks for your comments, I hope they help others trying to work through this stuff.


- Mark</description>
		<content:encoded><![CDATA[<p>@Garret:</p>
<p>>Again, these observations are less criticisms of your approach </p>
<p>No worries, as the years go by I always see plenty of room for improvement, and I would really like to rewrite this article someday. I finally did a rewrite of an LZW article after over 20 years, and I was happy with that - maybe this one is next!</p>
<p>Thanks for your comments, I hope they help others trying to work through this stuff.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Garret Wilson</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-421547</link>
		<dc:creator>Garret Wilson</dc:creator>
		<pubDate>Tue, 13 Dec 2011 20:47:22 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-421547</guid>
		<description>Mark, I cannot express how invaluable this article has been to me in implementing a syntax tree. I needed a Java syntax tree implementation, and after days of wrestling with tutorials (mostly yours), I am starting to wrap my head around it. I am by no means an algorithm expert, but after the mental sweat I&#039;ve poured into this, I&#039;ve come away with a few insights that make your algorithm a bit clearer for me. Please don&#039;t take any of these thoughts negatively---they merely reflect how my mind has come to understand the algorithm.

The Suffix class was a little confusing from its name, as it represents a suffix from the root, yet there is some optional &quot;leftover&quot; part past the active node. In my implementation I called this class State following Ukkonen.

The Suffix/State and Edge end offsets you use are inclusive. In 2011, in my mind anyway, many programmers are accustomed to end positions being exclusive, and indeed it makes many of the length calculations cleaner and more intuitive.

Several equivalent variables were arbitrarily intermingled, and logic could have been further refactored and isolated. For example, in Edge::SplitEdge() the new near edge is created extending from suffix.origin_node---which should always be equivalent to the old edge&#039;s edge.start_node. Using the Suffix origin node when splitting an edge obscures the fact that edge-splitting logically is an operation completely independent of suffixes. That is, to split an edge I merely need to provide the edge to split and a length position at which to split (even though this length may have originally come from the suffix). This loosens the coupling of edge-splitting logic from the overall suffix-tree building operation, making the isolated routine more understandable and testable on its own.

In the comments to stre2006.cpp, saying &quot;... we ... set first_char_index &gt; last_char_index ... to flag [an explicit node]...&quot; was confusing to me, and I looked for code that set first_char_index &gt; last_char_index as a flag. What you mean, of course, is that the algorithm sometimes advances first_char_index &gt; last_char_index, and we interpret this situation as such a flag. The difference is subtle, and may only reflect my particular way of thinking about this.

Lastly, whenever there is an infinite for(;;) loop, I pause to see if there is some more concise representation I am missing in my iteration logic. In fact, if I understand this correctly, it turns out that in the present implementation, when going to the next shorter suffix from the origin node (active.first_char_index++), first_char_index may be greater than last_char_index, but only by two characters (or, with an end-exclusive implementation such as mine, by only one character). When this occurs, we not only know that there is no shorter suffix in this round, but that the previous iteration must also have been explicit, which means that it must have added a new edge. Once we note this, we can exit the loop immediately. I accomplish this by turning the for(;;) loop into a do{} while(state.nextSmallerSuffix()) loop, in which nextSmallerSuffix() returns false if start&gt;end after advancing start++ (again, using exclusive end positions). The current implementation in this situation needlessly loops back around and does the same check it made before, finding the edge that was added the previous time around before breaking. I could be wrong about this, but so far my implementation is passing the validation tests.

Again, these observations are less criticisms of your approach than awe that I was able to understand this at all, thanks mainly to reading your article. I have finished my Java implementation. After tidying up the code and added methods to make the class useful in actual string processing, I&#039;ll post a link here. Thanks again for publishing this.</description>
		<content:encoded><![CDATA[<p>Mark, I cannot express how invaluable this article has been to me in implementing a syntax tree. I needed a Java syntax tree implementation, and after days of wrestling with tutorials (mostly yours), I am starting to wrap my head around it. I am by no means an algorithm expert, but after the mental sweat I've poured into this, I've come away with a few insights that make your algorithm a bit clearer for me. Please don't take any of these thoughts negatively---they merely reflect how my mind has come to understand the algorithm.</p>
<p>The Suffix class was a little confusing from its name, as it represents a suffix from the root, yet there is some optional "leftover" part past the active node. In my implementation I called this class State following Ukkonen.</p>
<p>The Suffix/State and Edge end offsets you use are inclusive. In 2011, in my mind anyway, many programmers are accustomed to end positions being exclusive, and indeed it makes many of the length calculations cleaner and more intuitive.</p>
<p>Several equivalent variables were arbitrarily intermingled, and logic could have been further refactored and isolated. For example, in Edge::SplitEdge() the new near edge is created extending from suffix.origin_node---which should always be equivalent to the old edge's edge.start_node. Using the Suffix origin node when splitting an edge obscures the fact that edge-splitting logically is an operation completely independent of suffixes. That is, to split an edge I merely need to provide the edge to split and a length position at which to split (even though this length may have originally come from the suffix). This loosens the coupling of edge-splitting logic from the overall suffix-tree building operation, making the isolated routine more understandable and testable on its own.</p>
<p>In the comments to stre2006.cpp, saying "... we ... set first_char_index &gt; last_char_index ... to flag [an explicit node]..." was confusing to me, and I looked for code that set first_char_index &gt; last_char_index as a flag. What you mean, of course, is that the algorithm sometimes advances first_char_index &gt; last_char_index, and we interpret this situation as such a flag. The difference is subtle, and may only reflect my particular way of thinking about this.</p>
<p>Lastly, whenever there is an infinite for(;;) loop, I pause to see if there is some more concise representation I am missing in my iteration logic. In fact, if I understand this correctly, it turns out that in the present implementation, when going to the next shorter suffix from the origin node (active.first_char_index++), first_char_index may be greater than last_char_index, but only by two characters (or, with an end-exclusive implementation such as mine, by only one character). When this occurs, we not only know that there is no shorter suffix in this round, but that the previous iteration must also have been explicit, which means that it must have added a new edge. Once we note this, we can exit the loop immediately. I accomplish this by turning the for(;;) loop into a do{} while(state.nextSmallerSuffix()) loop, in which nextSmallerSuffix() returns false if start&gt;end after advancing start++ (again, using exclusive end positions). The current implementation in this situation needlessly loops back around and does the same check it made before, finding the edge that was added the previous time around before breaking. I could be wrong about this, but so far my implementation is passing the validation tests.</p>
<p>Again, these observations are less criticisms of your approach than awe that I was able to understand this at all, thanks mainly to reading your article. I have finished my Java implementation. After tidying up the code and added methods to make the class useful in actual string processing, I'll post a link here. Thanks again for publishing this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-409763</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Thu, 01 Dec 2011 17:35:48 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-409763</guid>
		<description>The sentence I have given describes an invariant property of the suffix tree. If you find that character &#039;D&#039; is a descendant of &#039;ABC&#039;, it means that the tree *must* also contain BCD and CD. If this was not the case, it wouldn&#039;t be a suffix tree.

The algorithm must be written to guarantee this invariant holds.

&quot;How&quot; is what the article is all about!

- Mark</description>
		<content:encoded><![CDATA[<p>The sentence I have given describes an invariant property of the suffix tree. If you find that character 'D' is a descendant of 'ABC', it means that the tree *must* also contain BCD and CD. If this was not the case, it wouldn't be a suffix tree.</p>
<p>The algorithm must be written to guarantee this invariant holds.</p>
<p>"How" is what the article is all about!</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony Bruguier</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-408330</link>
		<dc:creator>Tony Bruguier</dc:creator>
		<pubDate>Tue, 29 Nov 2011 19:29:14 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-408330</guid>
		<description>Mark,

Thanks for writing this article. It&#039;s quite helpful. I think I understand almost everything, except one sentence. You say:

&quot;Knowing how the construction algorithm works, you can see that if you find a certain character as a descendant of a particular suffix, you are bound to also find it as a descendant of every smaller suffix.&quot;

How can we guarantee this?

Thanks
Tony (different from another Tony above)</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>Thanks for writing this article. It's quite helpful. I think I understand almost everything, except one sentence. You say:</p>
<p>"Knowing how the construction algorithm works, you can see that if you find a certain character as a descendant of a particular suffix, you are bound to also find it as a descendant of every smaller suffix."</p>
<p>How can we guarantee this?</p>
<p>Thanks<br />
Tony (different from another Tony above)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Burrow Wheeler transform, Suffix Arrays and FM Index &#171; Homologus</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/comment-page-3/#comment-390677</link>
		<dc:creator>Burrow Wheeler transform, Suffix Arrays and FM Index &#171; Homologus</dc:creator>
		<pubDate>Thu, 20 Oct 2011 20:35:04 +0000</pubDate>
		<guid isPermaLink="false">/1996/08/01/suffix-trees/#comment-390677</guid>
		<description>[...] version of constructing suffix trees was presented in a paper by Edward McCreight in 1976. I found this link most helpful on suffix [...]</description>
		<content:encoded><![CDATA[<p>[...] version of constructing suffix trees was presented in a paper by Edward McCreight in 1976. I found this link most helpful on suffix [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

