Search text for multiple keywords

I need to find text (about 500 words) for words in an English dictionary (about 275,000 keywords) English words and right now the query I'm using is not really optimized, it takes more than 10 seconds to complete (there is a table of words and texts table):

SELECT word FROM words WHERE 'The quick brown fox jumps over the lazy dog' LIKE CONCAT( '%', word, '%' );

Got an idea here .

I already set the word field as an index and saw some examples of people who store text in a database or put it directly in a query.

Other examples showed that people using FULLTEXT search, although they have 300 thousand words, I don’t think that FULLTEXT will work, I think it’s good to search with +brown +lazy -apple , but in my case I don’t need much logic.

Another example I saw is to comment on words with the IN (...) clause IN (...) , although with 500 m keywords the query will be insanely long.

Any ideas what to do?

Right now the text is saved as a text field and words like varchar(50) in InnoDB encoded with utf8_unicode_ci , I heard that InnoDB is slow, so I can use MyISAM or any other. I am using MySQL 5.5, although I could upgrade to 5.6 if that helped.

+5
source share
2 answers

LIKE comparisons are basically tests with uniform alignment. They are not a general keyword search engine.

WHERE foo LIKE '%ab%' will find records containing literal text ab anywhere in the field foo, they will not search separately for a or b , ab is one monolithic word and this word is fully executed.

If you want to find a few “words” using LIKE , you need to do

 WHERE foo LIKE '%a%' OR foo LIKE '%b%' OR etc... 

which quickly becomes ugly and extremely inefficient - %... search cannot use indexes.

You better switch to the fulltext search engine where you can have a much simpler

 WHERE MATCH(foo) AGAINST ('a b') 
+2
source

InnoDB can get pretty slow when you start getting millions of records. This is largely due to the way it blocks rows when accessing a table.

I would use MyIsam so you can do a FULLTEXT search. Perhaps something like:

  select word from words where match(text) against(word) 

I am not sure about the efficiency, but you do not need to use the logic that you spoke about, I do not think.

EDIT:

My code really needs to have more than one pass to work, as the second parameter should really be all the words. You can use the FOR loop inside SQL to populate, I suppose, but I will have to think about the code for this. The cursor or stored procedure will probably do the trick.

I agree with the other answer, however, that you need to use a FULLTEXT search.

0
source

All Articles