Find short words with SOLR

I use SOLR along with NGramTokenizerFactory to help create search markers for word substrings

NGramTokenizer configured with a minimum word length of 3

This means that I can search, for example. "unb" and then match the word "incredible".

However, I have a problem with short words like "I" and "in". They are not indexed by SOLR (I suspect it is due to NGramTokenizer), and therefore I cannot search for them.

I do not want to reduce the minimum word length to 1 or 2, as this creates a huge search index. But I would like SOLR to include whole words whose length is already below this minimum.

How can i do this?

/ Karsten

+6
source share
2 answers

First of all, try to understand why your words are not indexed by solr using the "Analysis Tool"

http://localhost:8080/solr/admin/analysis.jsp

Just put the field and the text you are looking for and see which analyzer filters your short term. I suggest you do this because you said that you only have a “suspect” and you need to be sure which analyzer filters your data.

Then why don't you just copy this term into another field without this analyzer?

Thus, your terms will be indexed twice and both the exact word and n-grams will be displayed. Then you need to deal with a lot of two different fields.

Hope this helps you a bit.

Some link for the aggregation and copy attribute:

+6

, , solr.

, :

  <!-- Keep small words safe from the n-gram filter -->
  <filter class="solr.PatternReplaceFilterFactory" pattern="^(.{2})$" replacement=" $1"/>

  <!-- Do the n-gramming -->
  <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25"/>
  <filter class="solr.ReverseStringFilterFactory"/>
  <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25"/>
  <filter class="solr.ReverseStringFilterFactory"/>

  <!-- Remove the padding spaces -->
  <filter class="solr.TrimFilterFactory"/>

, minGramSize, , , NGram- .

PatternReplaceFilterFactory -filters.

<!-- Protect single characters! (Two spaces) -->
<filter class="solr.PatternReplaceFilterFactory" pattern="^(.{1})$" replacement="  $1"/>
0

All Articles