Solr doesn't highlight some words

Question

Solr doesn't highlight some words

I configured solr 4.10 (also 5.3) with feature highlighting . It works fine with most words, but I found a few words that " does not allow " to allow selection, i.e. Solr returns the required documents, but does not highlight some of them.

What can cause such an effect?

solrconfig.xml

<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">json</str> <str name="indent">true</str> <str name="defType">edismax</str> <str name="bf">product(concount)</str> <str name="df">text bio text_syn text_syn_other</str> <str name="qf"> text^25 bio^16 text_syn^8 text_syn_other^3 </str> <str name="hl">on</str> <str name="hl.fl">text bio text_syn text_syn_other</str> <str name="hl.preserveMulti">true</str> <str name="hl.encoder">html</str> <str name="f.text.hl.fragsize">100</str> <str name="hl.snippets">20</str> <arr name="components"> <str>highlight</str> </arr> </lst>

schema.xml

  <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <field name="text" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" /> <field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="concount" type="long" indexed="true" stored="true" multiValued="false" /> <field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" /> <copyField source="text" dest="text_syn"/> <copyField source="bio" dest="text_syn"/> <copyField source="text" dest="text_syn_other"/> <copyField source="bio" dest="text_syn_other"/>

For the request http://localhost:8983/solr/select?q=senior I received documents containing the word senior , but in the highlight section of the solr response, the word was not highlighted.

UPDATE 1: I found out that I have the word senior in my synonyms_abbr.txt file, the line senior,lead . When I commented on this line or replaced the words, lead,senior , it is surprising that the word senior began to stand out. Any ideas?

UPDATE 2: Words from synonyms.txt and synonyms_other.txt are usually highlighted, but words from synonyms_abbr.txt behave strangely as follows. For example, I have the line lead,head,senior in synonyms_abbr.txt , then

requests http://localhost:8983/solr/select?q=senior and http://localhost:8983/solr/select?q=head do not highlight a word,
the request http://localhost:8983/solr/select?q=lead highlights not only the word lead , but also head and senior .

+7

highlight solr

Mher Oct 20 '15 at 11:59

source share

3 answers

Some fields are not saved, so they cannot be returned. Because they are indexed, they are searchable. Change your layout to keep = "true" for all the fields you want to highlight.

 <field name="text_syn" type="text_en_syn" indexed="true" stored="true" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="true" multiValued="true" />

Looking at my configuration, I assume that you highlight work on bio fields and text?

+2

ilinca Oct 23 '15 at 13:58

source share

Could you add the eldest, leading and leading, senior to the synonyms_abbr.txt file, and then try to run the marker.

0

user155806 Oct 29 '15 at 11:43

source share

vvs · Accepted Answer · 2015-10-30T07:27:30+0000

From your update2, it is clear that only the first word among lead,head,senior actually used for matching synonyms and highlighting.

If you look at the Docs on SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters , then expand=true is mentioned with a certain effect

The synonyms parameter specifies an external file that defines synonyms. If ignoreCase is true, the match will be lowercase before checking for equality. If the extension is true, the synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list .

The site also provides examples

 # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod

This is similar to the behavior you are observing. This means that you must change the definition of synonym filters in schema.xml to use expand = true OR change the way your synonym file defines a filter to use explicit matching.

In addition, since parsers work during indexing, you may need to re-index documents for this.

Solr doesn't highlight some words

More articles: