Fastest PHP Matching Procedure

What is the fastest way in PHP to take a list of keywords and match it with a search result (like an array of captions) for all words ?

For example, if the key phrase is “ great leather shoes, ” then the following headings will match ...

  • Get some really great leather shoes
  • Leather shoes Large
  • Great day! Those cool leather shoes !
  • Shoes made from Leather can be Great

... until they match:

  • Leather shoes on sale today!
  • You will love these <strong> leather shoes significantly
  • Great shoes. Not cheap.

I assume that to achieve this goal there is some kind of trick with array functions or RegEx (regular expression).

+5
source share
6 answers

I would use an index for words in headings and test if each search term is in that index:

$terms = explode(' ', 'great leather shoes');
$titles = array(
    'Get Some Really Great Leather Shoes',
    'Leather Shoes Are Great',
    'Great Day! Those Are Some Cool Leather Shoes!',
    'Shoes, Made of Leather, Can Be Great'
);
foreach ($titles as $title) {
    // extract words in lowercase and use them as key for the word index
    $wordIndex = array_flip(preg_split('/\P{L}+/u', mb_strtolower($title), -1, PREG_SPLIT_NO_EMPTY));
    // look up if every search term is in the index
    foreach ($terms as $term) {
        if (!isset($wordIndex[$term])) {
            // if one is missing, continue with the outer foreach
            continue 2;
        }
    }
    // echo matched title
    echo "match: $title";
}
+4
source

you can preg_grep () your array against something like

 /^(?=.*?\bgreat)(?=.*?\bleather)(?=.*?\shoes)/

or (maybe faster) grep each word separately, and then array_intersect results

+3
source

(, / ), , , - :

$keywords = array(
    'great',
    'leather',
    'shoes'
);

$titles = array(
    'Get Some Really Great Leather Shoes',
    'Leather Shoes Are Great',
    'Great Day! Those Are Some Cool Leather Shoes!',
    'Shoes, Made of Leather, Can Be Great',
    'Leather Shoes on Sale Today!',
    'You\'ll Love These Leather Shoes Greatly',
    'Great Shoes Don\'t Come Cheap'
);

$matches = array();
foreach( $titles as $title )
{
  $wordsInTitle = preg_split( '~\b(\W+\b)?~', $title, null, PREG_SPLIT_NO_EMPTY );
  if( array_uintersect( $keywords, $wordsInTitle, 'strcasecmp' ) == $keywords )
  {
    // we have a match
    $matches[] = $title;
  }
}

var_dump( $matches );

, .

+2

, in_array .

if (in_array('great', $list) && in_array('leather', $list) && in_array('shoes', $list)) {
    // Do something
}
+1

/(?=.*?\great\b)(?=.*?\bshoes\b)(?=.*?\bleather\b)/

a) , , , , , " ".

b) (.. *?). , * (.. , , ). , ?,. * , "". "" "". * , .

+1

, , , :

'#(?:\b(?>great\b()|leather\b()|shoes\b()|\w++\b)\W*+)++\1\2\3#i'

, , " ". , , (\1\2\3) , .

, , , - . - , (++, *+) ((?>...)).

However, I would still hold my gaze if I did not know that this causes a bottleneck. In most cases, its great readability deserves a compromise in performance.

+1
source

All Articles