I have a constantly growing keyword database. I need to parse incoming text inputs (articles, channels, etc.) and find which keywords from the database are present in the text. The keyword database is much larger than text.
As the database is constantly growing (users are adding more and more keywords to view), I believe that the best option would be to break the text into words and compare it with the database. My main dilemma implements this comparison scheme (PHP and MySQL will be used for this project).
The most naive implementation would be to create a simple SELECT query on a keyword table with a giant IN clause listing all the keywords found.
SELECT user_id,keyword FROM keywords WHERE keyword IN ('keyword1','keyword2',...,'keywordN');
Another approach would be to create a hash table in memory (using something like memcache) and check it in the same way.
Does anyone have experience with this search and any suggestions on how best to implement this? I have not tried any of these approaches, I just collect ideas at this stage.
source
share