How does Lucene's score depend on the relative position of the request?

I use WhitespaceAnalyzer as a query analyzer. If I have 2 documents:

 | text | abc | | text | bac | 

text is a field.

Now the index structure looks something like this:

 |Term| in document | | a | abc / bac| | b | abc / bac| | c | abc / bac| 

And I have a request:

 | text | abc | 

How can I get a higher score for abc and lower for bac .

Does Lucene support the calculation of scores based on relative position ?

I found that I found this to help:

 PhraseQuery phraseQuery = new PhraseQuery(); phraseQuery.setSlop(1); 

Thus, they will receive different grades.

More details: http://www.blogjava.net/tangzurui/archive/2008/09/22/230357.html

And here I come across another question: stack overflow

+1
source share
2 answers

It depends on what type of request you are using. Some queries may get more points if the phrase you complete is in the correct order (for example, New York or New York). According to Lucene's documentation, you can use the grading explanation to see why ABC gets a higher score than BA C.

Scoring is very dependent on the way you index documents, so it’s important to understand indexing (see Apache Lucene - Getting the Lucene Getting Started Guide and File Formats before continuing this section.) It is also assumed that readers know how to use the Searcher.explain (Query query, int doc), which can go a long way in informing why the bill is being returned.

http://lucene.apache.org/core/3_6_2/scoring.html

UPD To keep the position of the terms look at this if you are using Lucene 3 http://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/document/Field.TermVector.html

0
source

The contribution of the phrase match score depends on the distance:

  • Highest score for distance = 0 (exact match).
  • the score gets lower as the distance gets higher.

In your case, the query "abc" will match the document "abc" with a distance of 0. This will result in a higher score for the phrase. For the "bac" distance, the document will be greater than zero. Thus, the score will be less.

See the source code for the org.apache.lucene.search.SloppyPhraseScorer class for more information.

0
source

All Articles