Lucene RangeQuery does not filter properly

I use RangeQuery to get all documents that are between 0 and 2. When I execute the query, Lucene gives me documents that are larger than 2. What am I missing here?

Here is my code:

 Term lowerTerm = new Term("amount", minAmount); Term upperTerm = new Term("amount", maxAmount); RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true); finalQuery.Add(amountQuery, BooleanClause.Occur.MUST); 

and here is what goes into my index:

 doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES)); 
+6
c # lucene
source share
2 answers

UPDATE . Like @basZero in your comment, starting with Lucene 2.9, you can add number fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when searching.

Original answer

Lucene treats numbers as words, so their order is literal:

 0 1 12 123 2 22 

This means that for Lucene 12 is between 0 and 2. If you want to make the correct number range, you need to index the numbers with zero margin, and then search for the range in the range from [0000 to 0002]. (The number of additions required depends on the expected range of values).

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: INCORRECT INCORRECT . See Update)

If your numbers contain part of the fractions, leave it as it is, and the null pad only the whole part.

Example:

 -00002.12 -00001 

Strike>

 000000 000001 000003.1415 000022 

UPDATE Negative numbers are a bit complicated, as -1 to -2 in alphabetical order. This article gives a complete explanation regarding negative numbers and numbers in general in Lucene. Basically, you should β€œencode” numbers into something that makes the order of the elements understandable.

+6
source share

I created a PHP function that converts numeric values ​​into search tools for finding lucene / solr.

0.5 converted to 10000000000.5
-0.5 converts to 09999999999.5

 function luceneNumeric($numeric) { $negative = $numeric < 0; $numeric = $negative ? 10000000000 + $numeric : $numeric; $parts = explode('.', str_replace(',', '.', $numeric)); $lucene = $negative ? 0 : 1; $lucene .= str_pad($parts[0], 10, '0', STR_PAD_LEFT); $lucene .= isset($parts[1]) ? '.' . $parts[1] : ''; return $lucene; } 

It seems to work, hope it helps someone!

0
source share

All Articles