Matching optional string in regex

I am having trouble matching optional template groups in regex. The metacharacters * and + are greedy, so I thought about the metacharacter? will also be greedy, but it doesn't seem to work as I thought.

Theoretically, I suggested that if we decided to make the template group optional, if the group of templates was found in the string, it would be returned in the matching results, if it was not found, we would still get the general results of the match, but there is no match in the results.

What actually happens if my pattern matches in a string, it is not included in the matching results, the regular expression looks like it notices that a group of patterns is optional and just doesn't even try to match it.

If we set up a test and change this optional group of templates to non-optional, regex will include it in the results of the comparison, but this is only applicable for the test, because sometimes this template will not be available in a row.

The reason I need a match included in the results is because I need the matching results for analysis later.

Encase I did not describe this scenario very well; I have a very simple example that follows in PHP.

$string = 'This is a test, Stackoverflow. 2014 Cecili0n'; if(preg_match_all("~(This).*?(Stackoverflow)?~i",$string,$match)) print_r($match); 

results

 Array ( [0] => Array ( [0] => This ) [1] => Array ( [0] => This ) [2] => Array ( [0] => ) ) 

(Stackoverflow)? is an optional template, if we run the above code, although this template is available on line, it will not be returned in the matching results.

If we make this template group mandatory, it will be returned in the results, as in the following.

 if(preg_match_all("~(This).*?(Stackoverflow)~i",$string,$match)) print_r($match); 

results

 Array ( [0] => Array ( [0] => This ) [1] => Array ( [0] => This ) [2] => Array ( [0] => Stackoverflow ) ) 

How can i achieve this? It’s important for me to get accurate data on how a match was found.

Thanks for any thoughts on this.

+7
php regex
source share
2 answers

What's going on here

This may be surprising, but this is actually the expected behavior. Let me put together the regular expression and translate it into human-readable terms:

 (This) Match "This" literally .*? Match any character **as few times as possible**, while still allowing the rest of the expression to match (Stackoverflow)? Match "Stackoverflow" literally **if possible** 

So what happens:

  • The regex engine matches "This."
  • Then you need to consider how many characters must match the quantifier *? .
  • Suppose we match null characters. Does this mean that all other expressions match? In other words, (Stackoverflow)? match "is a test, Stackoverflow. 2014 Cecili0n"?
  • The subpattern is optional, the way it is! Therefore .*? matches zero characters.
  • What corresponds to the final subpattern (Stackoverflow)? ? Obviously, nothing in the place where the comparison was made.

End result: both quantified subpatterns correspond to an empty string.

How to get the expected result

If everything is optional, it won’t work, how can you choose "Stackoverflow"? Having clearly stated the acceptable parameters for the regex engine:

 ~(This)(.*?(Stackoverflow)|.*?)~i 

This indicates that the engine must either match as much as the Stackoverflow literal can follow, or else match as much as it can. Having listed the option "Stackoverflow included", at first you are sure that if it exists in the text, it will be matched.

Obviously, the parameter .*? doesn't make much sense in this example, but I leave it as it is because I wanted to describe a “mechanical” transformation that would work regardless of the actual regular expression.

Please note that in order to ensure complete equivalence to the original regular expression, the additional group introduced for structural purposes should not be captured:

 ~(This)(?:.*?(Stackoverflow)|.*)~i 

Look at the action .

+11
source share

I experimented with this, but it seems I can’t break it. Meanwhile, one viable option will consist of two tests, as shown in the example below.

 $string = 'This is a test, Stackoverflow. 2014 Cecili0n'; $pattern1 = "~(This).*?(Stackoverflow)~i"; $pattern2 = "~(This).*?~i"; if(preg_match_all($pattern1,$string,$match)) { print_r($match); } elseif(preg_match_all($pattern2,$string,$match)) { print_r($match); } 

I will update the answer when I find something better.

0
source share

All Articles