The relationship between the KMP algorithm and the Z algorithm

KMP and Z algorithms are well-known algorithms for finding strings,

KMP algorithm deals with finding patterns using the KMP reject function, which is defined as ( pat is a search pattern)

lps [i] = the longest valid prefix pat [0..i], which is also the suffix pat [0..i].

for example, for string "abcab" it will be [0, 0, 0, 1, 2]

where the function z is used as the algorithm Z , which is defined as:

Given a string S of length n, the Z-algorithm creates an array Z, where Z [i] is the length of the longest substring, starting with pat [i], which is also the prefix pat.

Now the question is, can we achieve function Z using the KMP algorithm? I am looking for several modifications in the lps array, which leads to the same results as the Z[i] array.

+4
string algorithm search
source share
3 answers

NOTE: incorrect algorithm

 for i in range(0, len(s)): if lps[i] != 0: Z[i - lps[i] + 1] = lps[i] 

After that, in Z[i] will be a maximum suffix length that starts at position i and which is also the line prefix.

EDIT

As nikhil_vyas noted, the proposed algorithm does not solve your problem. In fact, this partially fills the Z array with long suffixes and some others. Such an incomplete array may help you solve several problems of "finding the longest in a string", but this does not answer your question.

The easiest way to rebuild the Z array with the lps array that comes to my mind is to build a line corresponding to the lps array and then build the Z array for that line. But I'm not sure if this matches your definition of “some changes to the lps array”.

+2
source share

Mikhail Melnik’s solution cannot calculate Z for all indices in a string like “aaaaa”, we need an additional iteration to populate the indices left empty in the first iteration.

 for i in range(0, len(s)): Z[i - lps[i] + 1] = lps[i] for i in range(0, len(s)): Z[i] = max(Z[i], Z[i - 1] - 1) ` 
0
source share

I think it will be done.

 Z = [0] * len(lps) # initialize most to 0 iterator = enumerate(zip(lps, lps[1:]), start=1) for i, (prev, cur) in iterator: # step through adjacent pairs if cur <= prev: # suffix stopped growing Z[i - prev] = prev # Mark this suffix at its beginning. else: # end of loop # Ending the string is also a way to stop growing the suffix. if cur > 0: # if we were still growing a suffix # At end of loop, cur is the new prev, and i+1 is the new i. # (i == len(lps) - 1, cur == lps[-1]) Z[i+1 - cur] = cur 

Examples:

 lps = [0,0,0,1,2] #=> [0,0,0,2,0] lps = [0,0,1,2,1,2] #=> [0,0,2,0,2,0] 
0
source share

All Articles