Find the shortest repeating pattern in a string

I was wondering if there is a way to do pattern matching in Octave / Matlab? I know that Maple 10 has commands for this, but I'm not sure what I need to do in Octave / Matlab. Therefore, if the number were 12341234123412341234 , the pattern match would be 1234 . I am trying to find the shortest pattern that after repetiton generates the whole string.

Please note: numbers (only numbers will be used) will not be so simple. In addition, I will not know the template ahead of time (this is what I am trying to find). See Maple Example 10 , which shows that the pattern is not known in advance, but the team finds the pattern.

Maple 10 pattern matching example:

 ns:=convert(12341234123412341234,string); ns := "12341234123412341234" StringTools:-PrimitiveRoot(ns); "1234" 

How to do it in Octave / Matlab? Ps: I am using Octave 3.8.1

+5
source share
3 answers

To find the shortest pattern that generates an entire string when repeated, you can use regular expressions as follows:

 result = regexp(str, '^(.+?)(?=\1*$)', 'match'); 

Some examples:

 >> str = '12341234123412341234'; >> result = regexp(str, '^(.+?)(?=\1*$)', 'match') result = '1234' >> str = '1234123412341234123'; >> result = regexp(str, '^(.+?)(?=\1*$)', 'match') result = '1234123412341234123' >> str = 'lullabylullaby'; >> result = regexp(str, '^(.+?)(?=\1*$)', 'match') result = 'lullaby' >> str = 'lullaby1lullaby2lullaby1lullaby2'; >> result = regexp(str, '^(.+?)(?=\1*$)', 'match') result = 'lullaby1lullaby2' 
+9
source

I'm not sure if this can be done with regular expressions. Here is a script that will do what you need in case of a repeated word called pattern .

It moves through the characters of a string named str , trying to match another string with the name pattern . If the match fails, the pattern string expands as needed.

EDIT . I made the code more compact.

 str = 'lullabylullabylullaby'; pattern = str(1); matchingState = false; sPtr = 1; pPtr = 1; while sPtr <= length(str) if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string matchingState = true; pPtr = pPtr + 1; pPtr = mod(pPtr-1,length(pattern)) + 1; else %// if match fails, extend pattern string and start again if matchingState sPtr = sPtr - 1; %// don't change str index when transitioning out of matching state end matchingState = false; pattern = str(1:sPtr); pPtr = 1; end sPtr = sPtr + 1; end display(pattern); 

Conclusion:

 pattern = lullaby 

Note:

This does not allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2'; then

 pattern = lullaby1lullaby2 

It also allows the pattern end in the middle of the loop without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; still lead to

 pattern = lullaby1lullaby2 

To fix this, you can add lines

 if pPtr ~= length(pattern) pattern = str; end 
+3
source

Another approach is as follows:

  • determine the string length and find all possible factors of the string length value
  • for each possible factor length, change the line and check for repeated substring

To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple cycle, starting with the smallest factor length.

 function repeat = repeats_in_string(str); ns = numel(str); nf = find(rem(ns, 1:ns) == 0); for ii=1:numel(nf) repeat = str(1:nf(ii)); if all(ismember(reshape(str,nf(ii),[])',repeat)); break; end end 
+2
source

Source: https://habr.com/ru/post/1215052/


All Articles