Find the smallest string input period in O (n)?

Given the following problem:

Definition:

Let S be a string in the alphabet Σ. S' is the smallest period S if S' is the smallest string such that:

S = (S')^k (S'') ,

where S'' is the prefix of S If such S' does not exist, then S not periodic.

Example: S = abcabcabcabca . Then abcabc is the period with S = abcabc abcabc a , but the shortest period is abc , since S = abc abc abc abc a .

Let the algorithm find the shortest input period for string S or that S not periodic.

Hint: you can do it in O(n) ...

My solution: we use KMP, which works in O (n).

By the definition of the problem S = (S ') ^ k (S' '), then I think that if we create automata in the shortest period, and find a way to find this shortest period, then I am finished.

The problem is where to put the arrow of the FAIL automata ...

Any ideas would be highly appreciated

Hi

+6
string algorithm pattern-matching knuth-morris-pratt
source share
5 answers

Okay, so this problem can definitely be solved in O (n), we just have to cunningly use KMP, as you suggested.

Solving the longest valid prefix, which is also a suffix issue, is an important part of KMP that we will use.

The longest valid prefix, which is also a suffix issue, is a complete gulp, so for now, let's just call it the suffix suffix problem .

The problem with the prefix suffix can be quite difficult to understand, so I'll give some examples.

The prefix suffix solution for "abcabc" is "abc" because it is the longest line, which is both the correct prefix and the proper suffix (correct prefixes and suffixes cannot be the entire line).

The prefix suffix solution for "abcabca" is "a"

Hmmmmmmmmm, wait a minute, if we just chop off "a" at the end of "abcabca", we will have "abcabc", and if we get a solution ("abc") for this new line and cut it off again, we will stay with "abc" Hmmmmmmmmm Very interesting. (This is pretty much a solution, but I will talk about why this works)

Well, let's try to formalize this intuition a bit and see if we can find a solution.

I will use one key assumption in my argument:

The smallest period of our pattern is the valid period of each larger period in our pattern.

Let's save the prefix suffix solution for the first characters i our template to lps[i] . This lps array can be calculated in O(n) and it is used in the KMP algorithm, you can learn more about how to calculate it in O(n) here: https://www.geeksforgeeks.org/kmp-algorithm- for -search patterns /

Just for clarity, lps some examples of some lps arrays

Picture: "aaaaa"

lps: [0, 1, 2, 3, 4]

Picture: "AABBCC"

lps: [0, 1, 0, 0, 0, 0]

Picture: "abcabcabc"

lps: [0, 0, 0, 1, 2, 3, 4, 5, 6]

Ok, now let's define some variables to help us figure out why this lps array lps useful.

Let l be the length of our pattern, and let k be the last value in our lps array ( k=lps[l-1] )

The value k tells us that the first k characters of our string match the last k characters of our string. And we can use this fact to find the point!

Using this information, we can now show that the prefix consisting of the first characters lk our string forms a valid period. This is understandable because the next k characters that are not in our prefix must match the first k characters of our prefix because of how we defined our lps array. The first k characters from our prefix must match the last k characters that form our suffix.

In practice, you can implement this with a simple while loop, as shown below, where index denotes the end of the suffix that you currently consider the shortest period.

 public static void main(String[] args){ String pattern="abcabcabcabca"; int[] lps= calculateLPS(pattern); //start at the end of the string int index=lps.length-1; while(lps[index]!=0){ //shift back index-=lps[index]; } System.out.println(pattern.substring(0,index+1)); } 

And since lps is calculated in O(n) , and you always move at least 1 step back in the while loop, the time complexity for the whole procedure is just O(n)

I borrowed a lot from the geeksForGeeks KMP implementation in my CalculateLPS () method if you want to see my exact code below, but I recommend that you also look at their explanation: https://www.geeksforgeeks.org/kmp -algorithm-for -search patterns /

 static int[] calculateLPS(String pat) { int[] lps = new int[pat.length()]; int len = 0; int i = 1; lps[0] = 0; while (i < pat.length()) { if (pat.charAt(i) == pat.charAt(len)) { len++; lps[i] = len; i++; } else { if (len != 0) { len = lps[len - 1]; } else { lps[i] = len; i++; } } } System.out.println(Arrays.toString(lps)); return lps; } 

And last but not least, thanks for posting such an interesting issue, it was pretty interesting to figure it out! I am also new to this, so please let me know if any part of my explanation does not make sense.

+1
source share

I'm not sure I understand your decision. KMP is a useful routine, although the smallest period is how KMP moves the needle string (i.e. S ) after full matching.

0
source share

this problem can be solved with the Z function, this tutorial can help you.

0
source share

This problem can be easily solved with KMP.

Let n be the length of the original string. Find the first value> = n in the KMP array. This value should be at position k> = n (based on 0). Then k - n + 1 is the length of the shortest period of the string.

Example:

 Original string = abaaba n = 6 New string = abaabaabaaba KMP values for this new string: 0 0 1 1 2 3 4 5 6 7 8 9 

The first value> = n is 6, which is located at position 8. 8 - 6 + 1 = 3 is the length of the shortest period of the string (aba).

0
source share

See if this solution works for O (n). I used line rotation.

 public static int stringPeriod(String s){ String s1= s; String s2= s1; for (int i=1; i <s1.length();i++){ s2=rotate(s2); if(s1.equals(s2)){ return i; } } return -1; } public static String rotate(String s1){ String rotS= s1; rotS = s1.substring(1)+s1.substring(0,1); return rotS; } 

The full program is available in this github repository

-one
source share

All Articles