How to quickly search for book titles?

I have a database of about 200 thousand books. I want to give my users the ability to quickly find a book by title. Now, some headings may have a prefix of type A, THE, etc., and may also have numbers in the title, so a search of 12 should match books with "12", "twelve" and "tens" in the title. This will work through AJAX, so I need to make the database query very fast.

I assume that most users will try to search using some header words, so I'm going to split all the headings into words and create a separate database table that will display the words in the headings. However, I am afraid that this may not give the best results. For example, the title of a book may be about 2 or 3 commonly used words, and I could get a list of books with longer headings that contain all 2-3 words and the one I'm looking for lost, like a needle in a haystack. In addition, searching for a book with many words in the title may slow down the query due to the large number of OR clauses.

Basically, I'm looking for a way:

  • quickly find results.
  • sort them by relevance.

I guess this is not the first time that someone needs something like this, and I would not want to reinvent the wheel.

PS I am currently using MySQL, but if necessary I could switch to something else.

+8
algorithm search
source share
5 answers

Keep it simple. Create an index in the header field and use pattern matching. You cannot do it faster, because your bottleneck does not match the line, but the number of lines you want to match with the heading.

And just came up with another idea. You say that some words can be interpreted in different ways. Like 12, Twelve, a dozen. Instead of creating a query with different interpretations, why not store different interpretations of the names in a separate table, one by one in the books. Then you can GROUP BY book_id get unique book titles.

Say the book "A Penny in a Dozen." In the book table, it will be:

book_id=356 book_title='A dime in a dozen' 

The following will be saved in the header table:

 titles_id=123 titles_book_id=356 titles_title='A dime in a dozen' -- titles_id=124 titles_book_id=356 titles_title='A dime in a 12' -- titles_id=125 titles_book_id=356 titles_title='A dime in a twelve' 

The query for this is: SELECT b.book_id, b.book_title FROM books b INTRODUCTION to t headers on b.book_id = t.titles_book_id WHERE t.titles_title = '% twelve%' GROUP BY b.book_id

Now inserts are becoming a much more difficult task, but creating options can be done outside the database and inserted in one fell swoop.

+1
source share

One solution that could easily satisfy your data volume and speed requirements is to use Redis storage of key-value pairs. As I see it, you can continue your decision to display names by keywords and save them in the form:

keyword: book title set

Redis already has a built-in set data type that you can use.

Further, to get the titles of books containing search keywords, you can use the sinter command, which will display the given intersection for you.

Everything is done in memory; therefore, the response time is very fast. In addition, if you want to keep your index, redis has several different protection / caching mechanisms.

+1
source share

Maybe you should take a look at Apache Lucene . It is a high-performance Java-based information retrieval system.
you would like to create an IndexWriter and index all your headings, and you can add parameters (look at the class) associated with the real book. When searching, you will need IndexReader and IndexSearcher, as well as using search () on them. Look at the sample at: src / demo and at: http://lucene.apache.org/java/2_4_0/demo2.html Using the information search methods makes the indexing process longer, but for each search you do not need to go through most of the names, and in general you can expect better performance for your search. Also, choosing a good analyzer, you can ignore words such as "the", "a" ...

+1
source share

Using SOUNDEX is the best way, I think.

 SELECT id, title FROM products AS p WHERE p.title SOUNDS LIKE 'Shaw' // This will match 'Saw' etc. 

For best database results, you can best calculate the SOUNDEX value of your titles and put it in a new column. You can calculate soundex with SOUNDEX ("Hello").

Usage example:

 UPDATE `books` SET `soundex_title` = SOUNDEX(title); 
+1
source share

Apache Lucene with Solr is definitely a very good option for your problem.

You can directly bind Solr / Lucene to directly index your MySQL database. Here is a simple tutorial on linking a MySQL database with Lucene / Solr: http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

Here are the advantages and difficulties of using Lucene-Solr instead of MySQL full-text search: http://jayant7k.blogspot.com/2006/05/mysql-fulltext-search-versus-lucene.html

+1
source share

All Articles