String Matching: calculating the longest prefix suffix in the kmp algorithm

KMP algorithm for string matching . Below is the code that I found online to calculate the longest array of prefix-suffix:
Defination:

lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]. 

the code:

 void computeLPSArray(char *pat, int M, int *lps) { int len = 0; // length of the previous longest prefix suffix int i; lps[0] = 0; // lps[0] is always 0 i = 1; // the loop calculates lps[i] for i = 1 to M-1 while(i < M) { if(pat[i] == pat[len]) { len++; lps[i] = len; i++; } else // (pat[i] != pat[len]) { if( len != 0 ) { // This is tricky. Consider the example AAACAAAA and i = 7. len = lps[len-1]; //***************** // Also, note that we do not increment i here } else // if (len == 0) { lps[i] = 0; i++; } } } } 

Is it possible to use len = len-1 instead of len = lps[len-1] ?
because len always considers the length of the prefix as from [0 .. someIndex]. Then why use lps for assignment here? Below are the cases in which I tested, which work fine (the first line is the template, and the next two lines are the result for the original and modified len assignment):

 aaababc 0 1 2 0 1 0 0 0 1 2 0 1 0 0 abcbabc 0 0 0 0 1 2 3 0 0 0 0 1 2 3 aabcbab 0 1 0 0 0 1 0 0 1 0 0 0 1 0 

Code here with both spellings: http://ideone.com/qiSrUo

+7
c ++ string-matching algorithm
source share
3 answers

Following an example for which it does not work:

 i 0 1 2 3 4 5 p ABABBA c1 0 0 1 2 0 1 c2 0 0 1 2 2 3 

Cause:

 At i=4, len=2 p[i]='B' and p[len]='A' //Mismatch! lps string upto i=3: AB(0-1 prefix), (2-3 suffix) ------------------------------- i=4 Next charecter: B len=2 // longest prefix suffix length Charecter looking for : A (=p[len]) 

Thus, until i = 3, we used AB (0-1) as the prefix that corresponded to the suffix AB (2-3), but now there is a mismatch in i = 4, so we see that t extends the original prefix (0- 1), so the checked position is the prefix found before "AB", which is executed lps [len-1] -1, since the array starts with 0>, and this is not necessarily len-1, since we may need to step further, to get a new long prefix suffix.

+3
source share

Here is the best explanation I've ever seen. An example in it will clearly answer your question.

Knuth-Morris-Pratt Pattern Matching (KMP) (substring search)

0
source share

Here is the MY KMP code: -

 #include <bits/stdc++.h> using namespace std; int main(void){ int t; scanf("%d",&t); while(t--){ string s; cin>>s; int n = s.length(); int arr[n]; arr[0] = 0; int len = 0; for(int i = 1;i<n;){ if(s[len]==s[i]){ len++; arr[i++] = len; } else{ if(len!=0){ len = arr[len-1]; } else{ arr[i] = 0; i++; } } } cout<<arr[n-1]<<endl; } return 0; } 

The complexity of time is O (N)

0
source share

All Articles