Shortest non-standard substring: The shortest substring of one string, which is not a substring of another string

We need to find the shortest unusual substring between two lines i.e. if we have two lines a and b , so we need to find the length of the shortest substring a , which is not substring b .

How to solve this problem using suffix array?

To solve complexity no more than n * lg (n)

+7
source share
1 answer

This can be resolved O (N) times with the Generalized suffix tree .

After building a generalized suffix tree in O (N) time, you need to do a width search and find the first node that does not belong to both lines. The path from the root to this node gives the shortest unusual substring.


The same can be done using a generic suffix array for two input lines, also in O (N) time.

Build a generic suffix array along with the LCP array (or build an LCP array later from the suffix array). Add one null element as a prefix of the LCP array; add another null element as a suffix. Find a pair of minimal LCP entries so that only one line suffix exists, separated by these entries. This means that you need to perform a linear scan of the LCP array, extracting two minimum values, but reset as minimum values ​​to infinity every time you see the suffix of the other line, or if you see the suffix belonging to both lines. The larger element of the best of these pairs (which has the least value for the larger element in the pair) gives the length of the shortest unusual substring. This works because this pair of minimum values ​​limits all descendants of the first node (closest to the root) that do not belong to both lines in the corresponding suffix tree.

+9
source

All Articles