Looking for a suffix tree implementation in C #?

I implemented a basic search for a research project. I am trying to make the search more efficient by creating an entity tree . I am interested in the implementation of the C # Ukkonen algorithm . I don't want to waste time rolling around if such an implementation exists.

+12
c # data-structures suffix-tree
04 Oct '08 at 23:49
source share
3 answers

Hey, just finished implementing a .NET library (C #) containing various trie implementations. Among them:

  • Classic three
  • Patricia trie
  • Suffix three
  • Using trie with the Ukkonen algorithm

I tried to make the source code easy to read. Use is also very straightforward:

using Gma.DataStructures.StringSearch; ... var trie = new UkkonenTrie<int>(3); //var trie = new SuffixTrie<int>(3); trie.Add("hello", 1); trie.Add("world", 2); trie.Add("hell", 3); var result = trie.Retrieve("hel"); 

The library is well tested and also published as the TrieNet NuGet package.

See github.com/gmamaladze/trienet

+2
Sep 27 '17 at
source share

The tough question. Here I can find the one closest to coincidence: http://www.codeproject.com/KB/recipes/ahocorasick.aspx , which is an implementation of the Aho-Corasick string matching algorithm. The algorithm now uses the suffix tree structure per: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

Now, if you want a prefix tree, this article claims to have an implementation for you: http://www.codeproject.com/KB/recipes/prefixtree.aspx

<HUMOR> Now that I have done my homework, how about you mowing my lawn. (Link: http://flyingmoose.org/tolksarc/homework.htm ) </HUMOR>

Edit: I found an implementation of the C # suffix tree that was the C ++ ports published on the blog: http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

Edit: Codeplex has a new project focused on suffix trees: http://suffixtree.codeplex.com/

+13
Oct 11 '08 at 5:31
source share

Here is a suffix tree implementation that is reasonably efficient. I did not study the implementation of Ukkonen, but the running time of this algorithm, in my opinion, is quite reasonable, approximately O(N Log N) . Note that the number of internal nodes in the created tree is equal to the number of letters in the parent row.

 using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using NUnit.Framework; namespace FunStuff { public class SuffixTree { public class Node { public int Index = -1; public Dictionary<char, Node> Children = new Dictionary<char, Node>(); } public Node Root = new Node(); public String Text; public void InsertSuffix(string s, int from) { var cur = Root; for (int i = from; i < s.Length; ++i) { var c = s[i]; if (!cur.Children.ContainsKey(c)) { var n = new Node() {Index = from}; cur.Children.Add(c, n); // Very slow assertion. Debug.Assert(Find(s.Substring(from)).Any()); return; } cur = cur.Children[c]; } Debug.Assert(false, "It should never be possible to arrive at this case"); throw new Exception("Suffix tree corruption"); } private static IEnumerable<Node> VisitTree(Node n) { foreach (var n1 in n.Children.Values) foreach (var n2 in VisitTree(n1)) yield return n2; yield return n; } public IEnumerable<int> Find(string s) { var n = FindNode(s); if (n == null) yield break; foreach (var n2 in VisitTree(n)) yield return n2.Index; } private Node FindNode(string s) { var cur = Root; for (int i = 0; i < s.Length; ++i) { var c = s[i]; if (!cur.Children.ContainsKey(c)) { // We are at a leaf-node. // What we do here is check to see if the rest of the string is at this location. for (var j=i; j < s.Length; ++j) if (Text[cur.Index + j] != s[j]) return null; return cur; } cur = cur.Children[c]; } return cur; } public SuffixTree(string s) { Text = s; for (var i = s.Length - 1; i >= 0; --i) InsertSuffix(s, i); Debug.Assert(VisitTree(Root).Count() - 1 == s.Length); } } [TestFixture] public class TestSuffixTree { [Test] public void TestBasics() { var s = "banana"; var t = new SuffixTree(s); var results = t.Find("an").ToArray(); Assert.AreEqual(2, results.Length); Assert.AreEqual(1, results[0]); Assert.AreEqual(3, results[1]); } } } 
+1
Sep 13 '13 at 13:47 on
source share



All Articles