I configured solr 4.10 (also 5.3) with feature highlighting . It works fine with most words, but I found a few words that " does not allow " to allow selection, i.e. Solr returns the required documents, but does not highlight some of them.
What can cause such an effect?
solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">json</str> <str name="indent">true</str> <str name="defType">edismax</str> <str name="bf">product(concount)</str> <str name="df">text bio text_syn text_syn_other</str> <str name="qf"> text^25 bio^16 text_syn^8 text_syn_other^3 </str> <str name="hl">on</str> <str name="hl.fl">text bio text_syn text_syn_other</str> <str name="hl.preserveMulti">true</str> <str name="hl.encoder">html</str> <str name="f.text.hl.fragsize">100</str> <str name="hl.snippets">20</str> <arr name="components"> <str>highlight</str> </arr> </lst>
schema.xml
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <field name="text" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" /> <field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="concount" type="long" indexed="true" stored="true" multiValued="false" /> <field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" /> <copyField source="text" dest="text_syn"/> <copyField source="bio" dest="text_syn"/> <copyField source="text" dest="text_syn_other"/> <copyField source="bio" dest="text_syn_other"/>
For the request http://localhost:8983/solr/select?q=senior I received documents containing the word senior , but in the highlight section of the solr response, the word was not highlighted.
UPDATE 1: I found out that I have the word senior in my synonyms_abbr.txt file, the line senior,lead . When I commented on this line or replaced the words, lead,senior , it is surprising that the word senior began to stand out. Any ideas?
UPDATE 2: Words from synonyms.txt and synonyms_other.txt are usually highlighted, but words from synonyms_abbr.txt behave strangely as follows. For example, I have the line lead,head,senior in synonyms_abbr.txt , then
- requests
http://localhost:8983/solr/select?q=senior and http://localhost:8983/solr/select?q=head do not highlight a word, - the request
http://localhost:8983/solr/select?q=lead highlights not only the word lead , but also head and senior .