Regex to search for any character used more than 3 times per line, but not sequentially

Question

Regex to search for any character used more than 3 times per line, but not sequentially

I found all sorts of really close answers, but not quite.

I need to look at a string and find any character that is used more than three times. Basically, to limit the password, to ban "mississippi", because it has more than 3 seconds. I think it should be only characters, but should be unicode. Therefore, I assume that (: alpha :) for the character set to match.

I found (\w)\1+{4,} that finds consecutive characters like ssss or missssippi, but not if they are not consecutive.

Working through other regular expression questions to see if anyone has answered this, but there is a lot of joy.

+4

regex

geoffc Dec 03 '09 at 10:38

source share

3 answers

 (\w)(.*\1){2,}

Match the “word character”, then 2 copies of “nothing, then the first thing again.” So 3 copies of the first thing, with something in between.

+1

ephemient Dec 03 '09 at 10:40

source share

 .*(\w).*\1.*\1.*\1.*

This will correspond to a string containing any number of characters, then a specific character and the same character that is repeated three times after that (four in total) with any number of characters (0..n) between them. What do you want, right?

Test it, for example. http://www.regexplanet.com/simple/index.html

This regular expression matches for example. "mississippi" (> 3 s'es) and "flickering flickering little star" (> 3 t)

+1

Erik A. Brandstadmoen Dec 03 '09 at 22:45

source share

Mark byers · Accepted Answer · 2009-12-03T22:41:43+0000

This should do it:

 /(.)(.*\1){3}/

It makes no sense to try to combine this with validating characters. You must first verify that all characters are valid characters, and then run this test afterwards. That's why it's ok to use '.' here.

However, it will be slow. It would be faster to repeat once over a line and count the characters. Although for your purposes, I doubt it matters a lot since the lines are so short.

Regex to search for any character used more than 3 times per line, but not sequentially

More articles: