<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Fast String Searching With Suffix Trees</title>
	<link>http://marknelson.us/1996/08/01/suffix-trees/</link>
	<description>Programming, mostly.</description>
	<pubDate>Thu, 11 Mar 2010 18:54:29 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: suffix tree resources &#171; KcodeL</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-322899</link>
		<dc:creator>suffix tree resources &#171; KcodeL</dc:creator>
		<pubDate>Tue, 16 Feb 2010 18:13:02 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-322899</guid>
		<description>[...] http://marknelson.us/1996/08/01/suffix-trees/ [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] <a href="http://marknelson.us/1996/08/01/suffix-trees/" rel="nofollow">http://marknelson.us/1996/08/01/suffix-trees/</a> [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ziman</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-322664</link>
		<dc:creator>ziman</dc:creator>
		<pubDate>Wed, 10 Feb 2010 11:12:56 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-322664</guid>
		<description>Thanks much for this overview, it definitely helped me much to get an A at today's exam! :)</description>
		<content:encoded><![CDATA[<p>Thanks much for this overview, it definitely helped me much to get an A at today&#8217;s exam! :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Its Entirely True &#187; C# Suffix Tree</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-319837</link>
		<dc:creator>Its Entirely True &#187; C# Suffix Tree</dc:creator>
		<pubDate>Thu, 10 Dec 2009 02:31:12 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-319837</guid>
		<description>[...] I was unable to find a C# implementation of the suffix tree so I ported one I found at Mark Nelson&#8217;s Blog. The project of the C# port is located here [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] I was unable to find a C# implementation of the suffix tree so I ported one I found at Mark Nelson&#8217;s Blog. The project of the C# port is located here [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Legistrate</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-319191</link>
		<dc:creator>Legistrate</dc:creator>
		<pubDate>Wed, 25 Nov 2009 22:06:06 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-319191</guid>
		<description>Hmm, well it looks like I made an implementation mistake.  The algorithm does correctly handle my Phase 11('$').  I am still unsure of how you discovered the proper way to track the position of the active extension, but as the code works, I guess I could read through it a few more times.  I do however thing that the suffix pointers added are not always necessary(ie extras), but that could be a mistake too :P.</description>
		<content:encoded><![CDATA[<p>Hmm, well it looks like I made an implementation mistake.  The algorithm does correctly handle my Phase 11(&#8217;$').  I am still unsure of how you discovered the proper way to track the position of the active extension, but as the code works, I guess I could read through it a few more times.  I do however thing that the suffix pointers added are not always necessary(ie extras), but that could be a mistake too :P.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Legistrate</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-319181</link>
		<dc:creator>Legistrate</dc:creator>
		<pubDate>Wed, 25 Nov 2009 19:39:53 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-319181</guid>
		<description>I found your implementation the most useful of the various implementations out there, but the divergence from Gusfield was very confusing at first.  Then there are also some things that Gusfield doesn't seem to address directly.

For example, If you just wanted to implement suffix links, how would you know what the characters are when you walk up one edge to arrive at the parent (y in the text).  You need to know more than just the one character leading from that parent to the edge so you can walk back down after you traverse the suffix link with possibly multiple nodes.

I ask because in your code I finally realized that you don't actually walk down the edge to either a newly created inner node or to new children.  In this way you have the Suffix start track the position of the char that indicated the edge to the correct node.  While I'm still fuzzy on why exactly that works, in general I would say it does.

The start being larger than stop signals an external node, and the converse an internal node.  So in a phase, that means extensions are applied Rule 1, Rule 2(splits), Rule 2(new kids), Rule 3.  But when I use the string "mississippi$" I have an unexpected split in the last phase.  In Phase 10 adding (the last)'i' Suffix start is 9. The active point splits ppi$ into p: pi$ and i$, and the active point moves to the root.  Thus start is incremented by one(10) and the next extension should be explicit(start==stop) which it is since 'i' is already an internal node off the root.  Rule 3 breaks the loop and we move to the next phase('$').  But here, the stop has increased too, and now the algorithm thinks that the active point should be an implicit node.  This is not true, so a split is performed on the explicit node, and there is now an extra empty child in the final tree.

Could you maybe mention a little of how you devised a way to have a start counter for tracking the relevant index in the string as you move from extension to extension rather than the for loop covering each extension?</description>
		<content:encoded><![CDATA[<p>I found your implementation the most useful of the various implementations out there, but the divergence from Gusfield was very confusing at first.  Then there are also some things that Gusfield doesn&#8217;t seem to address directly.</p>
<p>For example, If you just wanted to implement suffix links, how would you know what the characters are when you walk up one edge to arrive at the parent (y in the text).  You need to know more than just the one character leading from that parent to the edge so you can walk back down after you traverse the suffix link with possibly multiple nodes.</p>
<p>I ask because in your code I finally realized that you don&#8217;t actually walk down the edge to either a newly created inner node or to new children.  In this way you have the Suffix start track the position of the char that indicated the edge to the correct node.  While I&#8217;m still fuzzy on why exactly that works, in general I would say it does.</p>
<p>The start being larger than stop signals an external node, and the converse an internal node.  So in a phase, that means extensions are applied Rule 1, Rule 2(splits), Rule 2(new kids), Rule 3.  But when I use the string &#8220;mississippi$&#8221; I have an unexpected split in the last phase.  In Phase 10 adding (the last)&#8217;i&#8217; Suffix start is 9. The active point splits ppi$ into p: pi$ and i$, and the active point moves to the root.  Thus start is incremented by one(10) and the next extension should be explicit(start==stop) which it is since &#8216;i&#8217; is already an internal node off the root.  Rule 3 breaks the loop and we move to the next phase(&#8217;$').  But here, the stop has increased too, and now the algorithm thinks that the active point should be an implicit node.  This is not true, so a split is performed on the explicit node, and there is now an extra empty child in the final tree.</p>
<p>Could you maybe mention a little of how you devised a way to have a start counter for tracking the relevant index in the string as you move from extension to extension rather than the for loop covering each extension?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Illya Havsiyevych</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-309141</link>
		<dc:creator>Illya Havsiyevych</dc:creator>
		<pubDate>Fri, 03 Jul 2009 21:47:56 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-309141</guid>
		<description>Hello,

FYI, 
Some other Suffix Trees based Java Applets:
* Generalized Suffix Tree - http://illya-keeplearning.blogspot.com/2009/06/generalized-suffix-trees-java-applet.html
* Diff - http://illya-keeplearning.blogspot.com/2009/07/suffix-trees-based-diff-java-applet.html

Thanks,
illya</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>FYI,<br />
Some other Suffix Trees based Java Applets:<br />
* Generalized Suffix Tree - <a href="http://illya-keeplearning.blogspot.com/2009/06/generalized-suffix-trees-java-applet.html" rel="nofollow">http://illya-keeplearning.blogspot.com/2009/06/generalized-suffix-trees-java-applet.html</a><br />
* Diff - <a href="http://illya-keeplearning.blogspot.com/2009/07/suffix-trees-based-diff-java-applet.html" rel="nofollow">http://illya-keeplearning.blogspot.com/2009/07/suffix-trees-based-diff-java-applet.html</a></p>
<p>Thanks,<br />
illya</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xutao</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-306840</link>
		<dc:creator>xutao</dc:creator>
		<pubDate>Mon, 15 Jun 2009 21:01:53 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-306840</guid>
		<description>[c]
/*
input a string to search query []
output start_index and end_index on the string (tree) search against
*/
void search_tree(char query [], int  &#038; start_index,
    int  &#038; end_index)

{
    int start_node = 0;
    int qp=0; //query position
    start_index = -1;
    end_index=-1;

    bool stop =false;
    while(!stop){
        Edge edge = Edge::Find(start_node, query[qp]);
        if ( edge.start_node == -1) {
            stop=true;

            break;
        }
        if (start_node ==0) start_index = edge.first_char_index;
        print_edge(edge);
        for (int i = edge.first_char_index; i &lt;=edge.last_char_index; i++){
            if(qp &gt;= strlen(query)) {
                //cout&lt;&lt;"whole query matched"&lt;&lt;endl;
                stop=true;
                break;
            }
            else if (query[qp] == T[i]){
                //cout&lt;&lt;query[qp]&lt;&lt;" ";
                qp++;
                end_index = i;
            }
            else{
                //cout&lt;&lt;"partially matched"&lt;&lt;endl;
                stop=true;
                break;
            }
        }
        if (!stop){ //proceed with next node
            start_node = edge.end_node;
            if(start_node==-1) stop=true;
            cout&lt;&lt;"next node    "&lt;&lt;start_node;
        }   
    }
}
[/c]</description>
		<content:encoded><![CDATA[<div class="igBar"><span id="lc-1"><a href="#" onclick="javascript:showPlainTxt('c-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">C:</span>
<div id="c-1">
<div class="c">
<ol>
<li class="li1">
<div class="de1"><span class="coMULTI">/*</span></div>
</li>
<li class="li2">
<div class="de2"><span class="coMULTI">input a string to search query []</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">output start_index and end_index on the string (tree) search against</span></div>
</li>
<li class="li2">
<div class="de2"><span class="coMULTI">*/</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw4">void</span> search_tree<span class="br0">&#40;</span><span class="kw4">char</span> query <span class="br0">&#91;</span><span class="br0">&#93;</span>, <span class="kw4">int</span>&nbsp; &amp; start_index,</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw4">int</span>&nbsp; &amp; end_index<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2"><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw4">int</span> start_node = <span class="nu0">0</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw4">int</span> qp=<span class="nu0">0</span>; <span class="co1">//query position</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; start_index = -<span class="nu0">1</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; end_index=-<span class="nu0">1</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; bool stop =<span class="kw2">false</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">while</span><span class="br0">&#40;</span>!stop<span class="br0">&#41;</span><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; Edge edge = Edge::<span class="me2">Find</span><span class="br0">&#40;</span>start_node, query<span class="br0">&#91;</span>qp<span class="br0">&#93;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> edge.<span class="me1">start_node</span> == -<span class="nu0">1</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stop=<span class="kw2">true</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>start_node ==<span class="nu0">0</span><span class="br0">&#41;</span> start_index = edge.<span class="me1">first_char_index</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; print_edge<span class="br0">&#40;</span>edge<span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span><span class="kw4">int</span> i = edge.<span class="me1">first_char_index</span>; i &lt;=edge.<span class="me1">last_char_index</span>; i++<span class="br0">&#41;</span><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span><span class="br0">&#40;</span>qp&gt;= strlen<span class="br0">&#40;</span>query<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//cout&lt;&lt;&quot;whole query matched&quot;&lt;&lt;endl;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stop=<span class="kw2">true</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">&#40;</span>query<span class="br0">&#91;</span>qp<span class="br0">&#93;</span> == T<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//cout&lt;&lt;query[qp]&lt;&lt;&quot; &quot;;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; qp++;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; end_index = i;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//cout&lt;&lt;&quot;partially matched&quot;&lt;&lt;endl;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stop=<span class="kw2">true</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>!stop<span class="br0">&#41;</span><span class="br0">&#123;</span> <span class="co1">//proceed with next node</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; start_node = edge.<span class="me1">end_node</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span><span class="br0">&#40;</span>start_node==-<span class="nu0">1</span><span class="br0">&#41;</span> stop=<span class="kw2">true</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cout&lt;&lt;<span class="st0">"next node&nbsp; &nbsp; "</span>&lt;&lt;start_node;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span>&nbsp; &nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xutao</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-306839</link>
		<dc:creator>xutao</dc:creator>
		<pubDate>Mon, 15 Jun 2009 21:00:41 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-306839</guid>
		<description>The following code is to help query the tree with a string. If the query is a substring of a suffix, it will return the position substring. 

example code following tree construction
[c]
int start, end;
search_tree(q, start, end);
cout= strlen(query)) {
				cout</description>
		<content:encoded><![CDATA[<p>The following code is to help query the tree with a string. If the query is a substring of a suffix, it will return the position substring. </p>
<p>example code following tree construction<br />
[c]<br />
int start, end;<br />
search_tree(q, start, end);<br />
cout= strlen(query)) {<br />
				cout</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xutao</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-306814</link>
		<dc:creator>xutao</dc:creator>
		<pubDate>Mon, 15 Jun 2009 18:35:12 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-306814</guid>
		<description>Hi Mark,
Good article and useful code. However when I walk_tree for "banana$", it took 2816 iterations! It is certainly nowhere near linear. Please help</description>
		<content:encoded><![CDATA[<p>Hi Mark,<br />
Good article and useful code. However when I walk_tree for "banana$", it took 2816 iterations! It is certainly nowhere near linear. Please help</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/1996/08/01/suffix-trees/#comment-306426</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Sat, 13 Jun 2009 22:24:09 +0000</pubDate>
		<guid>http://marknelson.us/1996/08/01/suffix-trees/#comment-306426</guid>
		<description>@Maria:

Check out Ilya's java app and you can see the results of construction in real time. That might be helpful.</description>
		<content:encoded><![CDATA[<p>@Maria:</p>
<p>Check out Ilya's java app and you can see the results of construction in real time. That might be helpful.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
