The following code (unlike other regular expression based answers) does exactly what you say you want: replace all sequences of more than two identical characters with 2 of them.
>>> import re >>> text = 'the numberr offf\n\n\n\ntheeee beast is 666 ...' >>> pattern = r'(.)\1{2,}' >>> repl = r'\1\1' >>> re.sub(pattern, repl, text, flags=re.DOTALL) 'the numberr off\n\nthee beast is 66 ..' >>>
You may not want to apply this treatment to some or all: numbers, punctuation, spaces, tabs, new etcccc characters. In this case, you need to replace . to a more restrictive sub-pattern.
For example:
ASCII letters: [A-Za-z]
Any letters depending on the language: [^\W\d_] in combination with the re.LOCALE flag
John machin
source share