Python: how to trim sequences of more than two identical characters in a string

Question

Python: how to trim sequences of more than two identical characters in a string

I am looking for an efficient way to randomly string so that all sequences of more than two equal characters are cut off after the first 2.

Some examples of input-> output:

hellooooooooo -> helloo woooohhooooo -> woohhoo

I'm sorting through the characters right now, but it's a little slower. Does anyone have another solution (regex or something else)

EDIT: current code:

 word_new = "" for i in range(0,len(word)-2): if not word[i] == word[i+1] == word[i+2]: word_new = word_new+word[i] for i in range(len(word)-2,len(word)): word_new = word_new + word[i]

+6

python string regex

Bart Nov 25 '10 at 14:51

source share

5 answers

The following code (unlike other regular expression based answers) does exactly what you say you want: replace all sequences of more than two identical characters with 2 of them.

 >>> import re >>> text = 'the numberr offf\n\n\n\ntheeee beast is 666 ...' >>> pattern = r'(.)\1{2,}' >>> repl = r'\1\1' >>> re.sub(pattern, repl, text, flags=re.DOTALL) 'the numberr off\n\nthee beast is 66 ..' >>>

You may not want to apply this treatment to some or all: numbers, punctuation, spaces, tabs, new etcccc characters. In this case, you need to replace . to a more restrictive sub-pattern.

For example:

ASCII letters: [A-Za-z]

Any letters depending on the language: [^\W\d_] in combination with the re.LOCALE flag

+2

John machin Nov 25 '10 at 20:12

source share

Also using regex but without function:

 import re expr = r'(.)\1{3,}' replace_by = r'\1\1' mystr1 = 'hellooooooo' print re.sub(expr, replace_by, mystr1) mystr2 = 'woooohhooooo' print re.sub(expr, replace_by, mystr2)

+1

André paramés Nov 25 '10 at 15:06

source share

I don't know python regexp, but you can adapt it:

 s/((.)\2)\2+/$1/g;

0

Toto Nov 25 '10 at 15:05

source share

I am posting my code, this is not a regular expression , but since you mentioned "or something else" ...

 def removeD(input): if len(input) < 3: return input output = input[0:2] for i in range (2, len(input)): if not input[i] == input[i-1] == input[i-2]: output += input[i] return output

not like bgporter (not a joke, it’s more to me than mine!), but at least in my system - time say that it always works faster.

0

Simone Nov 25 '10 at 15:21

source share

bgporter · Accepted Answer · 2010-11-25T15:01:37+0000

Edit: after applying helpful comments

 import re def ReplaceThreeOrMore(s): # pattern to look for three or more repetitions of any character, including # newlines. pattern = re.compile(r"(.)\1{2,}", re.DOTALL) return pattern.sub(r"\1\1", s)

(original answer here) Try something like this:

 import re # look for a character followed by at least one repetition of itself. pattern = re.compile(r"(\w)\1+") # a function to perform the substitution we need: def repl(matchObj): char = matchObj.group(1) return "%s%s" % (char, char) >>> pattern.sub(repl, "Foooooooooootball") 'Football'

Python: how to trim sequences of more than two identical characters in a string

More articles: