How to work with PHP lookbehind fixed-width constraints?

I had a problem trying to match all the numbers found between spesific words on my page. How would you match all the numbers in the following text, but only between the words "begin" and "end"?

11 a b 13 begin t 899 y 50 f end 91 h 

It works:

 preg_match("/begin(.*?)end/s", $text, $out); preg_match_all("/[0-9]{1,}/", $out[1], $result); 

But can this be done in one expression?

I tried this but it doesn’t do the trick

 preg_match_all("/begin.*([0-9]{1,}).*end/s", $text, $out); 
+7
php regex
source share
2 answers

You can use the \G anchor like this and some looks to make sure that you are not going to “leave territory” (from the area between two words):

 (?:begin|(?!^)\G)(?:(?=(?:(?!begin).)*end)\D)*?(\d+) 

demo version of regex101

 (?: # Begin of first non-capture group begin # Match 'begin' | # Or (?!^)\G # Start the match from the previous end of match ) # End of first non-capture group (?: # Second non-capture group (?= # Positive lookahead (?:(?!begin).)* # Negative lookahead to prevent running into another 'begin' end # And make sure that there an 'end' ahead ) # End positive lookahead \D # Match non-digits )*? # Second non-capture group repeated many times, lazily (\d+) # Capture digits 

Debugging if this also helps:

Regular expression visualization

+7
source share

The perfect solution

This really needs a positive variable width lookbehind . The regular expression will look like this:

 ~(?<=begin.*)\d+(?=.*end)~s 

However, at the time of this writing, the PHP regular expression expression does not support this function. Only fixed width lookbehind is supported. (Taste. Net) though).

Bypass

To achieve our goal, we can use preg_replace_callback with the following regex:

 ~(?<token>begin|end)|(?<number>\d+)|.*?~s 

Code example

 function extract_number($input) { function matchNumbers($match) { static $in_region = false; switch ($match['token']) { case 'begin': $in_region=true; break; case 'end': $in_region=false; break; } if ($in_region && isset($match['number'])) { return $match['number'].','; } else { return ''; } } $ret=preg_replace_callback('~(?<token>begin|end)|(?<number>\d+)|.*?~s', 'matchNumbers', $input); return array_filter(explode(',',$ret)); } echo '<pre>'; echo var_dump(extract_number($str)); echo '</pre>'; 

Output (with an example of OP)

 array(3) { [0]=> string(3) "899" [1]=> string(2) "50" } 
0
source share

All Articles