Solr normalization rate

I wanted to know if there is a way to find out if the first result in the solr answer is an exact match of my query? for example, I'm looking for documents with the words: "iphone 6s 64GB gold"

I have 3 results:

1) The first result with the words " iphone 6s 64GB " with the score: 187.86491

2) The second result with the words " iphone 6s " with the score: 170.36568

3) The third result with the word " iphone " with the score: 136.68152

When I normalize the ratings, I got these new ratings:

 1) score 1.0 2) score 0.92 3) score 0.66 

the problem is that the first result got a score of 1.0 (just because it is the first result with a higher solr score, but it cannot confirm that it is an exact match), whereas, in my opinion, it should be ~ 0, 5 because it is not an exact match. I want to know if the results that I have are relevant or not, and take only the “most relevant” results - for example: only results with a score> 0.6. But I can’t do it now, because 0.6 does not indicate real relevance.

+6
source share
2 answers

There is no such thing as “real relevance”, so the top score is not normalized to 1.0. Things can be considered more or less relevant based on the parameters that you give Solr (for example, how to evaluate individual fields against each other). What does “60% relevance” really mean? Estimates between requests (usually) are not comparable and will vary depending on the contents of the index (if a new document with the same terms is indexed, the estimates for the previous request can be reduced when you restart it).

If you want to set priorities for exact matches, add a field with KeywordTokenizer and LowercaseField and earn this field above (via qf =). If business matters, use StrField instead (which will give you only perfectly accurate matches) and rate this field higher.

If you want all conditions to be present, use q.op=AND , which will not give any hits if all fields are not present. If you want to perform a more detailed comparison, use the mm parameter to say exactly how many conditions you need to match (what can you do in percent, within an interval, etc.).

These settings are important when you use the smax or edismax request handler, which sounds like you are making from your question.

0
source

to do what you ask for (not considering why you want to do this), you could:

  • use highlighting to return what is matched in documents
  • Compare the query string with the selected fragments and make sure that this is the perfect match.

Cautions:

  • If you use stem cells, etc., an exact match can only mean a match with part of the term. Thus, you cannot just use string comparison, you need to first run both the query string and the fragment through each analysis chain (the query string through query analysis, fragments through index analysis).
  • Depending on the type of selection, you may need certain functions in your fields.
0
source

All Articles