Finding Auto Complete Using Solr Using NGrams

I am working on finding automatic completion using Solr using EdgeNGrams. If the user is looking for employee names, automatic completion must be applied. That is, I want the results to be similar to a Google search. It works great for some searches.

schema.xml file:

 <fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" /> </analyzer> 

 <field name="title" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/> <field name="empname" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true" /> <field name="autocomplete_text" type="edgytext" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" /> <copyField source="empname" dest="autocomplete_text"/> <copyField source="title" dest="autocomplete_text"/> 

  http://local:8080/test/suggest/?q=michael 

Result:

 <?xml version="1.0" encoding="UTF-8" ?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> </lst> <result name="response" numFound="0" start="0" /> <lst name="spellcheck"> <lst name="suggestions"> <lst name="michael"> <int name="numFound">9</int> <int name="startOffset">0</int> <int name="endOffset">7</int> <arr name="suggestion"> <str>michael bolton</str> <str>michael foret</str> <str>michael houser</str> <str>michael o'brien</str> <str>michael penn</str> <str>michael row your boat ashore</str> <str>michael tilson thomas</str> <str>michael w. smith</str> <str>michael w. smith featuring andrae crouch</str> </arr> </lst> <str name="collation">michael bolton</str> </lst> </lst> </response> 

It works great for me. When I search with michael f

 http:// local:8080/test/suggest/?q=michael f 

I get an answer like:

 <?xml version="1.0" encoding="UTF-8" ?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> </lst> <result name="response" numFound="0" start="0" /> <lst name="spellcheck"> <lst name="suggestions"> <lst name="michael"> <int name="numFound">9</int> <int name="startOffset">0</int> <int name="endOffset">7</int> <arr name="suggestion"> <str>michael bolton</str> <str>michael foret</str> <str>michael houser</str> <str>michael o'brien</str> <str>michael penn</str> <str>michael row your boat ashore</str> <str>michael tilson thomas</str> <str>michael w. smith</str> <str>michael w. smith featuring andrae crouch</str> </arr> </lst> <lst name="f"> <int name="numFound">10</int> <int name="startOffset">8</int> <int name="endOffset">9</int> <arr name="suggestion"> <str>f**k the facts</str> <str>fairest lord jesus</str> <str>fatboy slim</str> <str>ffh</str> <str>fiona apple</str> <str>foo fighters</str> <str>frank sinatra</str> <str>frans bauer</str> <str>franz ferdinand</str> <str>franΓ§ois rauber</str> </arr> </lst> <str name="collation">michael bolton f**k the facts</str> </lst> </lst> </response>. 

When I do a search with michael f , then I should only get michael foret . Data arrival begins with f . Is there something wrong with my configuration settings in Solr?

+4
source share
1 answer

I wrote [an old link] about how to make automatic suggestions with Solr, and about some of the questions you need to ask yourself to make the right choice. In short, ready-made methods:

  • Facet Prefix
  • Ngrams
  • TermsComponent
  • Sugarterster

All of them have their advantages and limitations at the same time, I would advise you to read the article.

If you are looking for a comprehensive and flexible solution that requires additional work, you can also read this article .

If you have already decided to use NGrams, taking into account your examples, you can index your employees using EdgeNGramFilterFactory with minGramSize 1, and then search in this field to automatically make suggestions. For the client side you need to use some JavaScript.

+6
source

All Articles