Combining a regex with multiple matches into one and getting matching

I have a list of regular expressions:

suresnes|suresne|surenes|surene pommier|pommiers ^musique$ ^(faq|aide)$ ^(file )?loss( )?less$ paris faq <<< this match twice 

My use case is that every template that got a match displays a link to my user, so I can have multiple pattern matches.

I am testing tag templates against a simple line of text "live in paris" / "faq" / "pom" ...

An easy way to do this is to iterate over all the templates using preg_match , but I will do it on a page with high performance criticality , so this view is bad for me .

Here's what I tried: combining all thoses expressions into one with group names:

 preg_match("@(?P<group1>^(faq|aide|todo|paris)$)|(?P<group2>(paris)$)@im", "paris", $groups); 

As you can see, each pattern is grouped: (?P<GROUPNAME>PATTERN) , and they are all separated by a pipe | .

The result is not what I expect, since only the first group match is returned. See when a match occurs, parsing is stopped.

What I want is a list of all suitable groups. preg_match_all does not help either.

Thanks!

+4
source share
3 answers

What about:

 preg_match("@(?=(?P<group1>^(faq|aide|todo|paris)$))(?=(?P<group2>(paris)$))@im", "paris", $groups); print_r($groups); 

output:

 Array ( [0] => [group1] => paris [1] => paris [2] => paris [group2] => paris [3] => paris [4] => paris ) 

(?= ) called lookahead

Regular expression explanation:

 (?= # start lookahead (?P<group1> # start named group group1 ^ # start of string ( # start catpure group #1 faq|aide|todo|paris # match any of faq, aide, todo or paris ) # end capture group #1 $ # end of string ) # end of named group group1 ) # end of lookahead (?= # start lookahead (?P<group2> # start named group group2 ( # start catpure group #2 paris # paris ) # end capture group #2 $ # end of string ) # end of named group group2 ) # end of lookahead 
+4
source

Try this approach:

 #/ define input string $str_1 = "{STRING HERE}"; #/ Define regex array $reg_arr = array( 'suresnes|suresne|surenes|surene', 'pommier|pommiers', '^musique$', '^(faq|aide)$', '^(file )?loss( )?less$', 'paris', 'faq' ); #/ define a callback function to process Regex array function cb_reg($reg_t) { global $str_1; if(preg_match("/{$reg_t}/ims", $str_1, $matches)){ return $matches[1]; //replace regex pattern with the result of matching is the key trick here //or return $matches[0]; if you dont want to get captured parenthesized subpatterns //or you could return an array of both. its up to you how to do it. }else{ return ''; } } #/ Apply array Regex via much faster function (instead of a loop) $results = array_map('cb_reg', $reg_arr); //returns regex results $results = array_diff($results, array('')); //remove empty values returned 

.

Basically, this is the fastest way I could think of.

  • You cannot combine 100s Regex into one call, as this will be a very complex regex for the build and will have several chances of failure. This is one of the best ways to do this.

  • In my opinion, combining a large number of regular expressions into 1 regular expression (if possible) will be slower to perform with preg_match than this Callback approach on arrays. Just remember that the key here is Callback function on array member values , which is the fastest way to process the array for your and similar situation in php

Also note that callback on Array not equal to looping the Array . Slowing is slower and has "n" from the analysis of algorithms. But the callback of the elements of the array is internal and very quick compared.

+1
source

You can combine all your regular expressions with "|" between them. Then apply this: http://www.rexegg.com/regex-optimizations.html , which will optimize it, collapse common expressions, etc.

0
source

All Articles