Regular Expressions for Google Operators

Using PHP, I try to improve the search on my site, supporting Google, for example, operators, for example.

  • keyword = natural / default
  • "keyword" or "search phrase" = exact match
  • keyword * = partial match

To do this, I need to split the string into two arrays. One for exact words (but without double quotes) in $ Array1 () and puts everything else (natural and partial keywords) in Array2 ().

What regular expressions will achieve this for the next line?


Example string ($ string)

Today I am "trying" to do a "google search" "test"

Desired Result

$Array1 = array( [0]=>trying [1]=>google search [2]=>testing ); $Array2 = array( [0]=>today [1]=>i'm [2]=>out [3]=>a* ); 

1) Exact I tried the following for the exact regexp, but it returns two arrays: one with one and without double quotes. I could just use $ result [1], but there might be a trick I miss here.

 preg_match_all( '/"([^"]+)"/iu', 'today i\'m "trying" \'out\' a* "google search" "test"', $result ); 

2) Natural / partial The following rule returns the correct keywords, but together with a few empty values. Can this regex rule be messy or should I just start the array through array_filter ()?

 preg_split( '/"([^"]+)"|(\s)/iu', 'today i\'m "trying" \'out\' a* "google search" "test"' ); 
+4
php regex
source share
1 answer

You can use strtok to tokenize a string.

See, for example, this tokenizeQuoted function obtained from this tokenizedQuoted function in the comments on the strtok tokenizedQuoted page :

 // split a string into an array of space-delimited tokens, taking double-quoted and single-quoted strings into account function tokenizeQuoted($string, $quotationMarks='"\'') { $tokens = array(array(),array()); for ($nextToken=strtok($string, ' '); $nextToken!==false; $nextToken=strtok(' ')) { if (strpos($quotationMarks, $nextToken[0]) !== false) { if (strpos($quotationMarks, $nextToken[strlen($nextToken)-1]) !== false) { $tokens[0][] = substr($nextToken, 1, -1); } else { $tokens[0][] = substr($nextToken, 1) . ' ' . strtok($nextToken[0]); } } else { $tokens[1][] = $nextToken; } } return $tokens; } 

Here is a usage example:

 $string = 'today i\'m "trying" out a* "google search" "test"'; var_dump(tokenizeQuoted($string)); 

Exit:

 array(2) { [0]=> array(3) { [0]=> string(6) "trying" [1]=> string(13) "google search" [2]=> string(4) "test" } [1]=> array(4) { [0]=> string(5) "today" [1]=> string(3) "i'm" [2]=> string(3) "out" [3]=> string(2) "a*" } } 
+5
source share

All Articles