Regular expression to find the longest repeating sequence of characters in a string

How to write a regular expression to find the longest repeating sequence of characters in a string?

+6
regex
source share
6 answers

You can find all the matching character sequences with the regular expression /(.)\1*/ .

Finding the longest such sequence is best done using a tool other than regular expressions.

+2
source share

You can run the following regular expression to find duplicate characters:

 (.)\1+ 

but should use your programming language to correctly determine the longest match.

+2
source share

The easiest way to do this is in a loop:

 #!/usr/bin/perl my $string = "this aaa and bbbb for ### ## ppppppp"; my $max = ""; while ($string =~ /((.)\2+)/gs) { $max = $1 if length($1) > length($max); } print "$max\n"; 

You can also use reduce , but this is less efficient:

 #!/usr/bin/perl use List::Util "reduce"; my $string = "this aaa and bbbb for ### ## ppppppp"; my $max = reduce { length($b) > length($a) ? $b : $a } "", $string =~ /((.)\2+)/gs; print "$max\n"; 

If you want to get only one task, simply:

 #!/usr/bin/perl my $string = "this aaa and bbbb for ### ## ppppppp"; my $max = ( sort { length($b) <=> length($a) } "", $string =~ /((.)\2+)/g)[0]; print "$max\n"; 

All three answers create ppppppp for this example line.

They also return an empty string if there is no such sequence, and they return the first such sequence in case of a link.

+2
source share

You don't, it's impossible to put a state, such as "longest", into a regular expression. The only thing you can do is make a regular expression and match it with a sequence. If it matches, get the length of the repeated characters and make a longer regular expression matching more characters. Keep doing this until you find a match.
This is a dumb alternative to simply writing simple parsing for text.

In pseudo code, which can be a parser:

 for(i = beginning to end, i++) { recurring_length = recurring(i, 1); if(recurring_length > max) max = recurring_length; } function recurring(i, length) { if(i+1 != EOF && (character at i == character i+1) ) return recurring(i+1, length + 1); else return length; } 
0
source share

Here's how to do it in Python (no need for regular expressions):

 >>> str = 'iamastriiiiiingwaitwaaaaaaaaaaaaaatttt' >>> lchar = '' >>> longest = 0 >>> cnt = 1 >>> for i in str: if lchar == i: cnt += 1 else: cnt = 1 if cnt > longest: longest = cnt longchar = i lchar = i >>> longchar 'a' >>> longest 14 

And if you want to save it in a string (quite simple):

 >>> string = '' >>> for x in range(longest): string += longchar >>> string 'aaaaaaaaaaaaaa' 
0
source share

You can try the following:

 #!/usr/bin/perl use 5.10.1; use strict; use warnings; use Data::Dumper; my $str = 'ahhhhhhhhhhjjjjjjjiiiieeeeeeeeeeeeeeei'; my ($char, $long) = ('',0); while($str=~/(.)\1*/g) { if (length $& > $long) { $long = length$&; $char = $1, } } say "$char : $long"; 

Output:

 e : 15 
0
source share

All Articles