Unwanted regular expression quantifier gives a greedy result

I have a .net regex that I'm testing with Windows Powershell. The output is as follows:

> [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb") Groups : {aaa aaa bbb} Success : True Captures : {aaa aaa bbb} Index : 0 Length : 11 Value : aaa aaa bbb 

I expected to use a quantifier ? will result in aaa bbb matching, since the second group a is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers wrong, or am I testing incorrectly?

Note: this is clearly not the same problem as the nongreedy regex is greedy

+6
regex non-greedy
source share
4 answers

This is a common misunderstanding. Lazy quantifiers do not guarantee the shortest match. They only make sure that the current quantifier from the current position does not match more characters than is necessary for general matching.

If you really want to ensure the shortest match, you need to make it explicit. In this case, this means that instead of .*? you want subregex to match anything that is not aaa and bbb . Thus, the resulting regular expression will be

 aaa(?:(?!aaa|bbb).)*bbb 
+5
source share

Compare the result for aaa aaa bbb bbb :

 regex: aaa.*?bbb result: aaa aaa bbb regex: aaa.*bbb result: aaa aaa bbb bbb 

The regex mechanism finds the first occurrence of aaa , and then skips all characters ( .*? ) Until the first occurrence of bbb , but for the greedy operator ( .* ) It will continue to search for a larger result and therefore correspond to the last occurrence of bbb .

+5
source share

This is not a greedy / lazy issue. The problem is that your string is parsed from left to right. When the first aaa matched, the regex engine adds characters one by one to have a complete pattern.

Note that with your greedy behavior in your example, you get the same result: the first aaa matched, the regex engine takes all the last characters and returns the character by character until it is fully matched.

+1
source share

Well, it's really simple, we have the following line

aaa aaa bbb

Let's see that we have this regular expression aaa.*?bbb . Regex engine starts with aaa

aaa aaa bbb

Now the regex engine has .*?bbb . He will continue to work with space

aaa space aaa bbb

but do we still have characters before bbb ? Thus, the regex engine will continue its path and match the second set

aaa aaa space bbb

Finally, the regex engine will match bbb :

aaa aaa bbb


So, let's see if we only want to combine the second aaa , we can use the following regular expression:

(?<!^)aaa.*?bbb , which means that it matches aaa , which is not at the beginning of the sentence.

We can also use aaa(?= bbb).*?bbb , which means that space bbb follows to match aaa .

See how 1 - 2 works.

Just come round, but why don't you use aaa bbb ?

0
source share

All Articles