Solr: fieldNorm for each document, without enlarging the document

I want my search results to be sorted by score, what they are doing, but the score is not calculated properly. In other words, not necessarily wrong, but different than expected, and I don't know why. My goal is to remove everything that changes the score.

If I perform a search that matches two objects (where it is expected that ObjectA has a higher score than ObjectB), ObjectB returns first.

Say, for this example, that my query is one term: "apples."

ObjectA Title: "Apples - Apples" (2/3 terms)
Description of the object: "There were apples in apples, and now apples have eaten all apples by apples!" (6/18)

ObjectB Title: "Apples Are Great" (1/3 semester)
ObjectB Object Description: "There were apples in apples, and now the apples have all gone bad on apples!" (4/18)

The title field has no increase (or rather, increase 1), and the description field has 0.8. I did not specify the document elevation through solrconfig.xml or through the request that I am viewing. If there is another way to indicate the acceleration of the document, there is a possibility that I do not know it.

After analyzing the explain printout, it looks like ObjectA is correctly calculating a higher score than ObjectB, as I want, except for one difference: the ObjectNorm object's header field is always higher than ObjectA.


Here's a printout of explain . You just know: the mditem5_tns header mditem5_tns and the mditem7_tns description mditem7_tns :

 ObjectB: 1.3327172 = (MATCH) sum of: 1.0352166 = (MATCH) max plus 0.1 times others of: 0.9766194 = (MATCH) weight(mditem5_tns:appl in 0), product of: 0.53929156 = queryWeight(mditem5_tns:appl), product of: 1.8109303 = idf(docFreq=3, maxDocs=9) 0.2977981 = queryNorm 1.8109303 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of: 1.0 = tf(termFreq(mditem5_tns:appl)=1) 1.8109303 = idf(docFreq=3, maxDocs=9) 1.0 = fieldNorm(field=mditem5_tns, doc=0) 0.58597165 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of: 0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of: 0.8 = boost 1.8109303 = idf(docFreq=3, maxDocs=9) 0.2977981 = queryNorm 1.3581977 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of: 2.0 = tf(termFreq(mditem7_tns:appl)=4) 1.8109303 = idf(docFreq=3, maxDocs=9) 0.375 = fieldNorm(field=mditem7_tns, doc=0) 0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of: 0.999001 = 1000.0/(1.0*float(1)+1000.0) 1.0 = boost 0.2977981 = queryNorm ObjectA: 1.2324848 = (MATCH) sum of: 0.93498427 = (MATCH) max plus 0.1 times others of: 0.8632177 = (MATCH) weight(mditem5_tns:appl in 0), product of: 0.53929156 = queryWeight(mditem5_tns:appl), product of: 1.8109303 = idf(docFreq=3, maxDocs=9) 0.2977981 = queryNorm 1.6006513 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of: 1.4142135 = tf(termFreq(mditem5_tns:appl)=2) 1.8109303 = idf(docFreq=3, maxDocs=9) 0.625 = fieldNorm(field=mditem5_tns, doc=0) 0.7176658 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of: 0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of: 0.8 = boost 1.8109303 = idf(docFreq=3, maxDocs=9) 0.2977981 = queryNorm 1.6634457 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of: 2.4494898 = tf(termFreq(mditem7_tns:appl)=6) 1.8109303 = idf(docFreq=3, maxDocs=9) 0.375 = fieldNorm(field=mditem7_tns, doc=0) 0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of: 0.999001 = 1000.0/(1.0*float(1)+1000.0) 1.0 = boost 0.2977981 = queryNorm 
+6
lucene solr relevance solr-boost
source share
2 answers

The problem is caused by the stem. He expands "apples are apples" so that "apples are apples," thereby making the field longer. Since document B contains only one term that is expanded by stem, the field remains shorter than document A.

This results in different Norms fields.

+6
source share

FieldNOrm is calculated from 3 components - increasing the time index in the field, increasing the time index on the document and the length of the field. Assuming you are not increasing the time indexing, the difference should be the length of the field.

Thus, since lengthNorm is higher for shorter field values, for B to have a higher Norm field value for the header, it must have fewer tokens in the header than A.

See the following pages for a detailed explanation of Lucene's score:

http://lucene.apache.org/java/2_4_0/scoring.html http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

+2
source share

All Articles