One way you might consider using ZSET in Redis for something like that. If you have very large datasets, you will find that you can do something like this:
$words = explode(" ", $input); // Pseudo-code for breaking a block of data into individual words. $word_count = count($words); $r = new Redis(); // Owlient PHPRedis PECL extension $r->connect("127.0.0.1", 6379); function process_phrase($phrase) { global $r; $phrase = implode(" ", $phrase); $r->zIncrBy("trending_phrases", 1, $phrase); } for($i=0;$i<$word_count;$i++) for($j=1;$j<$word_count - $i;$j++) process_phrase(array_slice($words, $i, $j));
To get the top phrases you should use this:
// Assume $r is instantiated like it is above $trending_phrases = $r->zReverseRange("trending_phrases", 0, 10);
$trending_phrases will be an array of the ten most popular phrases. To do things like recent phrases (as opposed to persistent global phrases), duplicate all of the Redis interactions above. For each interaction, use a key that indicates, for example, today's timestamp and the date of tomorrow (i.e.: days from January 1, 1970). When retrieving results using $trending_phrases simply download the key today and tomorrow (or yesterday) and use array_merge and array_unique to find the union.
Hope this helps!
mattbasta
source share