Regular regex for maximum periodic substrings

This is a regular expression for a regular expression for detecting periodic strings .

The period of a pline wis any natural number psuch that w[i]=w[i+p]when both sides of this equation are defined. Denote by the per(w)size of the smallest period w. We say that the string wis equal to periodic iff per(w) <= |w|/2.

So unofficially a periodic row is just a row that consists of another row that is repeated at least once. The only complication is that at the end of the line we do not need a full copy of the repeated line if it is repeated in its entirety at least once.

For example, consider a line x = abcab. per(abcab) = 3like x[1] = x[1+3] = a, x[2]=x[2+3] = band there is no shorter period. Therefore, the row is abcabnot periodic. However, the string ababais periodic as per(ababa) = 2.

As additional examples abcabca, ababababaand abcabcabcare also periodic.

@horcruz, among others, gave a very nice regular expression for recognizing a periodic string.

\b(\w*)(\w+\1)\2+\b

I would like to find all the maximum periodic substrings in a longer string. They are sometimes called works in the literature.

w , w[i-1] = w[i-1+p], w[j+1] = w[j+1-p]. "" "", .

  • () T = atattatt: T[4,5] = tt, T[7,8] = tt, T[1,4] = atat, T[2,8] = tattatt.

  • T = aabaabaaaacaacac 7 (): T[1,2] = aa, T[4,5] = aa, T[7,10] = aaaa, T[12,13] = aa, T[13,16] = acac, T[1,8] = aabaabaa, T[9,15] = aacaaca.

  • T = atatbatatb . : T[1, 4] = atat, T[6, 9] = atat T[1, 10] = atatbatatb.

( ), ?

, , , , , Python re. PCRE, .

( https://codegolf.stackexchange.com/questions/84592/compute-the-maximum-number-of-runs-possible-for-as-large-a-string-as-possible.)


https://pypi.python.org/pypi/regex. , , .

+2
3

, Python re:

(?<=(.))(?=((\w*)(\w*(?!\1)\w\3)\4+))

Fiddle: https://regex101.com/r/aA9uJ0/2

:

  • ; # . , .
  • 2 , .
  • ; .

:

  • (?<=(.)) - , ; 1
  • (?=...) - , ; . ?
  • (...) - ( 2)
  • (\w*)(\w*...\w\3)\4+ - @horcruz, OP
  • (?!\1) - 1, ,

@ClasG, . , . :

  • aabaab 3 : aabaab, aa aa. . .
  • atatbatatb 3 : atatbatatb, atat, atat. ; .

. , , , .

:

  • . , ; .
  • . ( X) , , ( Y). ( "" ), , X, Y.
+2

, . , . .

0

This type depends on your input criteria ... There is no infinite string of characters. Using backlinks, you can create a suitable representation of the last number of occurrences of the template that you want to map. \ Personally, I would define buckets of input length and then fill them.

Then I used the machines to search for patterns in buckets and then finally combined them into larger patterns.

Not so fast will RegEx be in this case, how quickly you can recognize the pattern and eliminate the invalid criteria.

0
source

All Articles