Solr doesn't highlight some words

I configured solr 4.10 (also 5.3) with feature highlighting . It works fine with most words, but I found a few words that " does not allow " to allow selection, i.e. Solr returns the required documents, but does not highlight some of them.

What can cause such an effect?

solrconfig.xml

<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">json</str> <str name="indent">true</str> <str name="defType">edismax</str> <str name="bf">product(concount)</str> <str name="df">text bio text_syn text_syn_other</str> <str name="qf"> text^25 bio^16 text_syn^8 text_syn_other^3 </str> <str name="hl">on</str> <str name="hl.fl">text bio text_syn text_syn_other</str> <str name="hl.preserveMulti">true</str> <str name="hl.encoder">html</str> <str name="f.text.hl.fragsize">100</str> <str name="hl.snippets">20</str> <arr name="components"> <str>highlight</str> </arr> </lst> 

schema.xml

  <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <field name="text" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" /> <field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="concount" type="long" indexed="true" stored="true" multiValued="false" /> <field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" /> <copyField source="text" dest="text_syn"/> <copyField source="bio" dest="text_syn"/> <copyField source="text" dest="text_syn_other"/> <copyField source="bio" dest="text_syn_other"/> 

For the request http://localhost:8983/solr/select?q=senior I received documents containing the word senior , but in the highlight section of the solr response, the word was not highlighted.


UPDATE 1: I found out that I have the word senior in my synonyms_abbr.txt file, the line senior,lead . When I commented on this line or replaced the words, lead,senior , it is surprising that the word senior began to stand out. Any ideas?


UPDATE 2: Words from synonyms.txt and synonyms_other.txt are usually highlighted, but words from synonyms_abbr.txt behave strangely as follows. For example, I have the line lead,head,senior in synonyms_abbr.txt , then

  • requests http://localhost:8983/solr/select?q=senior and http://localhost:8983/solr/select?q=head do not highlight a word,
  • the request http://localhost:8983/solr/select?q=lead highlights not only the word lead , but also head and senior .
+7
highlight solr
source share
3 answers

From your update2, it is clear that only the first word among lead,head,senior actually used for matching synonyms and highlighting.

If you look at the Docs on SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters , then expand=true is mentioned with a certain effect

The synonyms parameter specifies an external file that defines synonyms. If ignoreCase is true, the match will be lowercase before checking for equality. If the extension is true, the synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list .

The site also provides examples

 # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod 

This is similar to the behavior you are observing. This means that you must change the definition of synonym filters in schema.xml to use expand = true OR change the way your synonym file defines a filter to use explicit matching.

In addition, since parsers work during indexing, you may need to re-index documents for this.

+2
source share

Some fields are not saved, so they cannot be returned. Because they are indexed, they are searchable. Change your layout to keep = "true" for all the fields you want to highlight.

 <field name="text_syn" type="text_en_syn" indexed="true" stored="true" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="true" multiValued="true" /> 

Looking at my configuration, I assume that you highlight work on bio fields and text?

+2
source share

Could you add the eldest, leading and leading, senior to the synonyms_abbr.txt file, and then try to run the marker.

0
source share

All Articles