Lucene.Net: How to add a date filter to search results?

Our crawler works for me, but it usually returns results that are out of date. My site is very similar to NerdDinner when events in the past become inappropriate.

I am indexing this as a note now: my example is in VB.NET, but I don't care if the examples are given in C #

Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False) Dim doc As Document = New Document doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED)) doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing, "User" & searchableEvent.User.ID, searchableEvent.User.UserName), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED)) writer.AddDocument(doc) writer.Optimize() writer.Close() Return True End Function 

Please note that I have a date index that stores the date of the event.

Then my search is as follows

 ''# code omitted Dim reader As IndexReader = IndexReader.Open(luceneDirectory) Dim searcher As IndexSearcher = New IndexSearcher(reader) Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer()) Dim query As Query = parser.Parse(q.ToLower) ''# We're using 10,000 as the maximum number of results to return ''# because I have a feeling that we'll never reach that full amount ''# anyways. And if we do, who in their right mind is going to page ''# through all of the results? Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000) Dim doc As Document = Nothing ''# loop through the topDocs and grab the appropriate 10 results based ''# on the submitted page number While i <= last AndAlso i < topDocs.totalHits doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End While ''# code omitted 

I tried the following, but it was useless (threw a NullReferenceException).

  While i <= last AndAlso i < topDocs.totalHits If Date.Parse(doc.[Get]("date")) >= Date.Today Then doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End If End While 

I also found the following documentation, but I cannot make the heads or tails of this

http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

+7
source share
2 answers

You are referring to api Lucene 1.4.3 documentation. Lucene.Net is currently located at 2.9.2. I think the update should be.

First, you use Store.Yes alot. Stored fields will increase your index, which can become a performance issue. Your date problem can be easily solved by storing dates as strings in the format "yyyyMMddHHmmssfff" (this is really high resolution, up to milliseconds). You can reduce the resolution to create fewer tokens to reduce the size of the index.

 var dateValue = DateTools.DateToString(searchableEvent.EventDate, DateTools.Resolution.MILLISECOND); doc.Add(new Field("date", dateValue, Field.Store.YES, Field.Index.NOT_ANALYZED)); 

Then you apply a filter to your search (the second parameter in which you are now passing in Nothing / null).

 var dateValue = DateTools.DateToString(DateTime.Now, DateTools.Resolution.MILLISECOND); var filter = FieldCacheRangeFilter.NewStringRange("date", lowerVal: dateValue, includeLower: true, upperVal: null, includeUpper: false); var topDocs = searcher.Search(query, filter, 10000); 

You can do this using BooleanQuery, combining your regular query with RangeQuery, but it will also affect the score (which is calculated by query, not by filter). You can also avoid changing the request for simplicity so that you know which request is being executed.

+9
source

You can combine multiple queries with BooleanQuery . Since Lucene is only looking for a textual note that the date field in your index should be ordered by the most significant to the least significant part of the date, that is, in the format IS8601 ("2010-11-02T20: 49: 16.000000 + 00: 00")

Example:

 Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("fullText", searchTerms); Lucene.Net.Index.Term dateRange = new Lucene.Net.Index.Term("date", "2010*"); Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm); Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.WildcardQuery(dateRange); Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery(); query.Add(termQuery, BooleanClause.Occur.MUST); query.Add(dateRangeQuery, BooleanClause.Occur.MUST); 

Alternatively, if the wildcard is not accurate enough, you can add RangeQuery :

 Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm); Lucene.Net.Index.Term date1 = new Lucene.Net.Index.Term("date", "2010-11-02*"); Lucene.Net.Index.Term date2 = new Lucene.Net.Index.Term("date", "2010-11-03*"); Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.RangeQuery(date1, date2, true); Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery(); query.Add(termQuery, BooleanClause.Occur.MUST); query.Add(dateRangeQuery, BooleanClause.Occur.MUST); 
+7
source

All Articles