How to remove a substring from a suffix tree?

I looked through a lot of literature, but I did not find any information about deleting or inserting a substring into the suffix tree. There are only Ukkonen or McCreight algorithms for building a tree.
The worst way is to restore the tree after deleting or inserting a substring. But I think there is a better way to do this.
For example (positions are calculated from 0):
I have a suffix tree with "abcdef" and I need to remove characters from 1 to 3. And then I will have a suffix tree with "aef". And then I need to add the line “how” from line 1. And after that I will have a suffix tree with "aasef". Can you help me?

+7
source share
2 answers

you mix two tasks in your question, first look for the symbol, then replace the symbol. The suffix tree does the first part of the character search for you, now you need a second algorithm to replace this character with a new character. When replacing characters, the original suffix tree becomes invalid, so the tree needs to be displayed again for the second replacement.

What you need is two things: first, a "suffix array", this will give you more control over the search for characters and their location, and secondly, a "cache algorithm", this will help you with the replacement.

+1
source

I just started working with suffix trees, so I might be wrong, but it seems that inserting or deleting can change the tree in quite radical ways.

"abcdef" is a really trivial suffix tree:

abcdef ├a..$ ├b..$ ├c..$ ├d..$ ├e..$ └f$ 

Adding “g” to the end or removing “a” at the beginning is incredibly simple.

But let's say we run another 'a' in the middle:

 abcadef ├a │├b..$ │└d..$ ├b ├c ├... 

We have to go back and check each letter from the very beginning to see if we need to insert a node based on this. Same thing if we have a character from the end:

 abafef ├a │├bafef$ │└fef$ ├bafef$ ├f │├ef$ │└$ └ef$ 

If you inserted something like "ef" to the end, you have to go through and add new nodes everywhere!

Inserting a character looks like it will include a re-view of each character in the string, i.e. linear time. Since Ukkonen's algorithm already takes linear time, you should not use any dynamic insertion algorithm, you should just regenerate the tree from scratch every time with the confidence that it is still pretty good.

If you care about the space, you can always cache every step of the tree generation algorithm, and then when it comes time to insert or delete at point x, just load the tree as constructed to point x.

0
source

All Articles