How can I use Lucene PriorityQueue when I don't know the maximum size at creation time?

I created a custom collector for Lucene.Net, but I cannot figure out how to order (or page) the results. Each time Collect receives a call, I can add the result to the internal PriorityQueue, which, as I understand it, is the right way to do this.

I expanded PriorityQueue, but it requires a size parameter to create it. You must call Initialize in the constructor and pass the maximum size.

However, in the collector, the search engine simply calls Collect when it receives a new result, so I don’t know how many results I have when creating the PriorityQueue. Based on this, I cannot figure out how to make PriorityQueue work.

I understand that I probably missed something simple here ...

+4
source share
2 answers

PriorityQueue is not a SortedList or SortedDictionary . This is a sort of sorting implementation, where it returns the top M results (your PriorityQueue size) of N elements. You can add InsertWithOverflow as many elements as you want, but it will only contain the top M elements.

Suppose your search resulted in 1,000,000 hits. Will you return all results to the user? The best way would be to return the top 10 items to the user (using PriorityQueue(10) ) and if the user requests the next 10 results, you can perform a new search using PriorityQueue( 20 ) and return the next 10 items, etc. This is the trick of most search engines like google.

Everytime Commit gets called, I can add the result to an internal PriorityQueue .

I cannot uncover the relationship between Commit and search . Therefore, I will add an example using PriorityQueue:

 public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document> { public CustomQueue(int maxSize): base() { Initialize(maxSize); } public override bool LessThan(Document a, Document b) { //a.GetField("field1") //b.GetField("field2"); return //compare a & b } } public class MyCollector : Lucene.Net.Search.Collector { CustomQueue _queue = null; IndexReader _currentReader; public MyCollector(int maxSize) { _queue = new CustomQueue(maxSize); } public override bool AcceptsDocsOutOfOrder() { return true; } public override void Collect(int doc) { _queue.InsertWithOverflow(_currentReader.Document(doc)); } public override void SetNextReader(IndexReader reader, int docBase) { _currentReader = reader; } public override void SetScorer(Scorer scorer) { } } 

 searcher.Search(query,new MyCollector(10)) //First page. searcher.Search(query,new MyCollector(20)) //2nd page. searcher.Search(query,new MyCollector(30)) //3rd page. 

EDIT for @nokturnal

 public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj> where TComp : IComparable<TComp> { Func<TObj, TComp> _KeySelector; public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base() { _KeySelector = keySelector; Initialize(size); } public override bool LessThan(TObj a, TObj b) { return _KeySelector(a).CompareTo(_KeySelector(b)) < 0; } public IEnumerable<TObj> Items { get { int size = Size(); for (int i = 0; i < size; i++) yield return Pop(); } } } 

 var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue); foreach (var item in pq.Items) { } 
+6
source

The reason Lucene Priority Queue is limited in size is because it uses a fixed-size implementation very quickly.

Think about what a reasonable maximum number of results you need to return at a time and use this number, "waste" when the results are few, are not so bad for the benefits that it receives.

On the other hand, if you have such a huge amount of results that you cannot hold them, then how are you going to show / show them? Keep in mind that this is for β€œtop hits”, since you repeat the results, you will still click less and less relevant.

0
source

All Articles