Counting matching matches with Regex in C #

The following code evaluates to 2 instead of 4:

Regex.Matches("020202020", "020").Count; 

I assume that regex starts searching for the next match from the end of the previous match. Is there any way to prevent this. I have the string "0" and "2" and I am trying to count how many times I have three "2 in a row, four" in a row, etc.

+8
c # regex
source share
5 answers

This will return 4 as you expect:

 Regex.Matches("020202020", @"0(?=20)").Count; 

Looks at 20 without consuming it, so the next match attempt starts at the position following the first 0 . You can even do all regex as a look:

 Regex.Matches("020202020", @"(?=020)").Count; 

The regular expression engine automatically bends forward one position at every match of zero length. So, to find all runs from three 2 or four 2 , you can use:

 Regex.Matches("22222222", @"(?=222)").Count; // 6 

... and:

 Regex.Matches("22222222", @"(?=2222)").Count; // 5 

EDIT: Repeating my question, it seems to me that you can search for 2 , marked 0

 Regex.Matches("020202020", @"(?=20202)").Count; // 2 

If you don't know how much 0 will be, you can use this:

 Regex.Matches("020202020", @"(?=20*20*2)").Count; // 2 

And of course, you can use quantifiers to reduce repetition in the regular expression:

 Regex.Matches("020202020", @"(?=2(?:0*2){2})").Count; // 2 
+8
source share

Indeed, regular expression will continue from where the last one ends. You can get around this using lookahead templates. I am not a .NET guy, but I will try: "(?=020)." Translation: "find me one character, where this character and the next two characters 020 " The trick is that a match is only one character, not three, so you get all matches in a string, even if they overlap.

(you can also write it as "0(?=20)" , but this is less clear to people, at least: p)

+4
source share

Try this using a positive zero-width lookbehind:

 Regex.Matches("020202020",@"(?<=020)").Count; 

He worked for me, gave 4 matches.

My favorite link for Regex: Regular Expression Language is a short link. Also a quick way to try your Regex, I often use it for the complex Regex: Free Regular Expression Designer

+1
source share

Assuming you're really looking for sequences of consecutive 2 -s, there is another option without using lookaheads at all . (This does not work for arbitrary sequences where you are looking for patterns 0 and 2 )

List all occurrences of non-overlapping sequences of three or more 2 -s (how?), And then print the number of shorter subsequences.

For example, if you find one sequence of six consecutive 2 -s and one of five consecutive 2 -s, then you know that you should have (6-3 + 1) + (5-3 + 1) =? sequences of three consecutive 2 -s (potentially overlapping), etc .:

 0002222220000002222200 222 222 222 222 222 222 222 

For large strings, this should be somewhat faster than using lookaheads.

0
source share

Since the source contains two "020" patterns that match your regular expression pattern. Try changing the source:

 Regex.Matches("020202020", "02").Count; 

Now it will match 02 in a row and you will get four this time.

-4
source share

All Articles