Solr: punctuation strip before the index

I have a problem with striped punctuation from the solr index. When the punctuation mark follows immediately after a word, this word is not indexed properly.

For example: if we index “hello John”, this asset will not be found with the keyword “hello”, while there will be no problem if we remove the comma after the word “hello”.

Is there any FilterFactory that would suggest punctuation? Any ideas?

Thank you, Bogdan.

+5
source share
3 answers

This is done using WordDelimiterFilterFactory. Set generateWordParts = 1.

PatternTokenizerFactory, , .

+6

solr.PatternReplaceFilterFactory :

<filter class="solr.PatternReplaceFilterFactory"
    pattern="^\p{Punct}*(.*?)\p{Punct}*$"
    replacement="$1"/>

, () , :

<filter class="solr.PatternReplaceFilterFactory"
    pattern="^[\p{Punct}&&[^$]]*(.*?)\p{Punct}*$"
    replacement="$1"/>
+6

Use PatternReplaceFilterFactory            

<!-- remove punctuation -->
    <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>

...   

0
source

All Articles