Finding a duplicate substring in a large string

Question

Finding a duplicate substring in a large string

Take for example the following lines

0.714285714285714285714285714285714285714285
0.111111111111111111111111111111111111111111
0.166666666666666666666666666666666666666666

I want to find a substring repeating repetition for each.

714285
1
6

How to do this in python. Using regex is fine, I tried the following:

import re

testString = "0.714285714285714285714285714285714285714285"
print(re.search(r"(.+)\1", testString).group(1))

This gives me the (wrong) conclusion:

714285714285714285

It should be 7814285

How to fix it? Is there a way to improve my regex or regex is the wrong tool for this job? Maybe python has an awesome built-in for this? Should I use this with or without regex?

EDIT Before posting a response using a test case

0.0022271714922048997772828507795100222717149220489977728285077951002227171492204899777282850779510022

He must return 00222717149220489977728285077951

+4

python string regex

user89239213892389 May 31 '16 at 15:07

2

:

(\d\d)(\d+?)(?=\1)

http://ideone.com/OpqJ9c

:
. , - , , .

import re
testString = "0.714285714285714285714285714285714285714285"
print(re.search(r"(\d)(\d+?)(?=\1)", testString).group(0)) 
#714285

(\d)(\d+?)(?=\1)

Match the regex below and capture its match into backreference number 1 «(\d)»
   Match a single character that is a "digit" (ASCII 0–9 only) «\d»
Match the regex below and capture its match into backreference number 2 «(\d+?)»
   Match a single character that is a "digit" (ASCII 0–9 only) «\d+?»
      Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=\1)»
   Match the same text that was most recently matched by capturing group number 1 (case insensitive for A-Z; fail if the group did not participate in the match so far) «\1»

0

Pedro Lobito 31 '16 15:27

Casimir et Hippolyte · Accepted Answer · 2016-05-31T17:33:48+0000

:

(?=(\d+)\1+(.*))(\d+?)\3+\2$

( 0):

(?=(\d+)\1+(.*))(\d+?)(?=\3+\2$)

?

, .

?

(.. (\d+)), \1+, , 2.

, , (\d+?)\3+ , : , , .

, 3 .

3.

(.. ), .

, , , :

\.(?=(\d+)\1+(.*))(\d+?)\3+\2$ # immediately after the dot

\..*?(?=(\d+)\1+(.*))(\d+?)\3+\2$ # the first after the dot

(, ), lookahead re.findall:

(?=(\d+)\1+(.*))(?=(\d+?)\3+\2$)

( , )

Finding a duplicate substring in a large string

More articles: