Implementing a keyword comparison scheme (reverse search)

Question

Implementing a keyword comparison scheme (reverse search)

I have a constantly growing keyword database. I need to parse incoming text inputs (articles, channels, etc.) and find which keywords from the database are present in the text. The keyword database is much larger than text.

As the database is constantly growing (users are adding more and more keywords to view), I believe that the best option would be to break the text into words and compare it with the database. My main dilemma implements this comparison scheme (PHP and MySQL will be used for this project).

The most naive implementation would be to create a simple SELECT query on a keyword table with a giant IN clause listing all the keywords found.

SELECT user_id,keyword FROM keywords WHERE keyword IN ('keyword1','keyword2',...,'keywordN');

Another approach would be to create a hash table in memory (using something like memcache) and check it in the same way.

Does anyone have experience with this search and any suggestions on how best to implement this? I have not tried any of these approaches, I just collect ideas at this stage.

+3

php mysql keyword search tokenize

Eran galperin Jan 2 '09 at 20:12

source share

6 answers

Norman Ramsey · Answer 1 · 2009-01-02T23:00:06+0000

Aho-Corasick, , , . , , , , , .

fgrep. , Preston Briggs C, , . ( "" ".) Preston Noweb. PHP PHP --- 220 C, - 135 .

, Aho-Corasick, :

, , .
, , , .

Aho-Corasick , , . , , , , , . . DAWG scrabble , , .

Hugh Bothwell · Answer 2 · 2009-01-02T21:04:39+0000

.
.
. strip
, , ( )

, - 3 4 , ; , .

ʞɔıu · Answer 3 · 2009-01-02T20:47:42+0000

100% , , , ?

Update:

.

. A ( ) :

inverted_index
-----
document_id keyword

:

select document_id, count(*) from inverted_index
  where keyword in (keyword1, keyword2, keyword3)
  group by document_id 
  having count(*) = 3

, , , in():

keyword_table
----
keyword othercols

select keyword_table.keyword, keyword_table.othercols from inverted_index 
   inner join keyword_table on keyword_table.keyword=inverted_index.keyword
   where inverted_index.document_id=id_of_some_new_document

- , ?

Nick Gerakines · Answer 4 · 2009-01-02T21:01:53+0000

2 .

( ) . , , . Aka, , userb slice two ..

-, - - , . , , . n- , n, memcached. , x , , . / , .

, , , , , , .

: SQL .

Bill Karwin · Answer 5 · 2009-01-02T22:41:23+0000

, Sphinx?

, . . , , , , .

Sphinx MySQL.

Graham Toal · Answer 6 · 2009-02-11T22:44:47+0000

, dawg ( , Scrabble), , , AHO .

http://www.gtoal.com/wordgames/spell/multiscan.c.html

, wordgame, , , :

http://www.gtoal.com/wordgames/spell/multidawg.c.html

...

Implementing a keyword comparison scheme (reverse search)

More articles: