How to identify a typo in a product search and suggest possible corrections?

Given a very large database of product names, how would you detect possible typos in user searches and suggest possible corrections (for example, how Google presents them)?

eg.

The user enters "fork handels" and clicks "search".

They come back

"Search results. Did you mean" fork handles "?"

+6
user-interface algorithm validation search data-entry
source share
3 answers

There are several approaches to this problem:

  • Saving a table of the most popular spelling errors in your database. If you need common spelling mistakes: here )
  • Using an algorithm based on distance editing . In the field of information theory and computer science, the editing distance between two lines of characters is the number of operations needed to convert one of them to another. There are several different algorithms for determining or calculating this metric. Read the Wikipedia article for the Levenshtein algorithm .
  • If you use Lucene for full-text search, here is a good article that shows how to implement the β€œDid you mean” feature.
  • If you see this feature as a simple spell correction, here are some good, very short implementations in several languages: How to write a spelling corrector
+13
source share

You can use a phonetic algorithm like Soundex to find matches similar to similar ones.

PostgreSQL has a module called fuzzystrmatch with documents showing examples of using Soundex, Levenshtein, Metaphone and Double Metaphone.

+2
source share

I am sure that I read that google maintains a list of what the user repeats when he does not receive any results. You can save the mapping of these values ​​(say, if a string is reprinted with the same letter).

+1
source share

All Articles