Php undercanding greedy vs nongreedy matching

It:

preg_match('~foo(.*?)(bar)?~','foo bar',$m); 

gives me the following:

 Array ( [0] => foo [1] => ) 

I am a little confused by this. I get group 1 to give me an empty string because it is a lazy coincidence. But shouldn't be greedy and thus give me capture group 2?

It seems reasonable to me that I should receive

 Array ( [0] => foo [1] => [2] => bar ) 

where [1] is the space. And yet ... this does not happen. What for?

+7
php regex
source share
3 answers

The answer here is surprisingly simple. The first group does not correspond (on the first pass), not even in space. The second group is trying to match the space with the "bar", which, of course, fails. If there is something behind the HAS that matches, the engine will now roll back and expand the first gripper group to fit the space. But it works just fine as it is now (the second group can really be emtpy), so it just stays that way.

To create what you expect, try the following:

 preg_replace('~foo(.*?)(bar)?_~', 'foo bar_', $m); 


In your edit, you added another capture group. (. *) now 2. It matches to the end of the line, as you thought. So you are right on this, you just changed the example ^^
+5
source share

No, this is the correct behavior. From the documentation for lazy matching :

if the quantifier is followed by a question mark, then it becomes lazy and instead corresponds to the minimum number of times

So how is (bar)? optional, (.*?) must not match anything for the regular expression to succeed. Since the space between foo and bar was not recorded, the expression cannot continue and match bar .

+3
source share

The entry '0' always matches the full pattern, which in this case is foo . However, the first matching group does not use anything like *. The second group is optional.

+2
source share

All Articles