Search for documents using partial words

I am looking for a document search engine (e.g. Xapian, Whoosh, Lucene, Solr, Sphinx or others) that is able to search for partial terms.

For example, when searching for the term "brit", the search engine should return documents containing "britney" or "britain" or in general any document containing matching words r *brit*

Tangentially, I noticed that most engines use TF-IDF (Term frequency-Inverse document frequency) or its derivatives, which are based on full terms, not partial terms. Are there any other methods that have been successfully implemented besides TF-IDF for document search?

+7
source share
1 answer

With lucene, you can implement this in several ways:

1.) You can use *brit* wildcard queries (you will need to specify your query parser to enable wildcard management)

2.) You can create an additional field containing N-grams of all conditions . This will increase the indexes, but in many cases it will be faster (search speed).

3.) You can use fuzzy search to handle input errors in the request. for example, someone dialed britnei but wanted to find britney .

For wildcard queries and fuzzy searches, see query syntax documents .

+11
source

All Articles