Parsing Search String for Phrases and Keywords

I need to parse a search string for keywords and phrases in php like

line 1: value of "measured response" detect goal "method valuation" study

will give: value,of,measured reponse,detect,goal,method valuation,study

I also need it to work if the string has:

  • no phrases in quotation marks,
  • any number of phrases is enclosed in quotation marks with any number of keywords outside the quotation marks,
  • only phrases in quotation marks,
  • keywords separated by spaces.

I tend to use preg_match with the pattern '/(\".*\")/' to get phrases into an array, then remove the phrases from the string, and then finally process the keywords in the array. I just can't get it all together!

I also think of replacing spaces outside quotes with commas. then blast them into an array. if this is the best option, how can I do this with preg_replace ?

Is there a better way to do this? Help! Thank you all very much

+7
source share
3 answers
 preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER); for ($i = 0; $i < count($result[0]); $i++) { # Matched text = $result[0][$i]; } 

This will give the results you are looking for.

Explanation:

 # (?<!")\b\w+\b|(?<=")\b[^"]+ # # Match either the regular expression below (attempting the next alternative only if this one fails) «(?<!")\b\w+\b» # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!")» # Match the character """ literally «"» # Assert position at a word boundary «\b» # Match a single character that is a "word character" (letters, digits, etc.) «\w+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» # Assert position at a word boundary «\b» # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «(?<=")\b[^"]+» # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")» # Match the character """ literally «"» # Assert position at a word boundary «\b» # Match any character that is NOT a """ «[^"]+» # Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
+10
source
 $s = 'value of "measured response" detect goal "method valuation" study'; preg_match_all('~(?|"([^"]+)"|(\S+))~', $s, $matches); print_r($matches[1]); 

exit:

 Array ( [0] => value [1] => of [2] => measured response [3] => detect [4] => goal [5] => method valuation [6] => study ) 

The trick here is to use the branch group reset: (?|...|...) . This is exactly the same as the rotation contained in the group without capture - (?:...|...) - except that within each branch the numbers of the capture group begin with the same number. (For more information, see PCRE docs and search for DUPLICATE SUBPATTERN NUMBERS .)

Thus, the text of interest to us always captures group # 1. You can get the contents of group # 1 for all matches using $matches[1] . (Assuming that the PREG_PATTERN_ORDER flag is set, I did not specify it as @FailedDev because it is the default. For more details see PHP docs .)

+2
source

There is no need to use a regular expression, the str_getcsv built-in function can be used to blow up a string with any delimiters, wrappers and escape characters.

Indeed, it is just as simple.

 // where $string is the string to parse $array = str_getcsv($string, ' ', '"'); 
+1
source

All Articles