This can be resolved O (N) times with the Generalized suffix tree .
After building a generalized suffix tree in O (N) time, you need to do a width search and find the first node that does not belong to both lines. The path from the root to this node gives the shortest unusual substring.
The same can be done using a generic suffix array for two input lines, also in O (N) time.
Build a generic suffix array along with the LCP array (or build an LCP array later from the suffix array). Add one null element as a prefix of the LCP array; add another null element as a suffix. Find a pair of minimal LCP entries so that only one line suffix exists, separated by these entries. This means that you need to perform a linear scan of the LCP array, extracting two minimum values, but reset as minimum values ββto infinity every time you see the suffix of the other line, or if you see the suffix belonging to both lines. The larger element of the best of these pairs (which has the least value for the larger element in the pair) gives the length of the shortest unusual substring. This works because this pair of minimum values ββlimits all descendants of the first node (closest to the root) that do not belong to both lines in the corresponding suffix tree.
Evgeny Kluev
source share