How to combine matches from two separate (no fragments) Lucene indices

I have two separate indexes containing different fields that together contain all the search fields for the index. For example, the first index contains indexed text for all documents, and the second contains tags for each document.

Note that the example below is a little depressed as I changed the names of the objects. Index1: text document identifier

index2: tag-name: "very important" user: "Fred id"

I would like to leave the indices separate, since it seems useless to constantly update one index when the user adds / removes a tag.

So far it seems to me that I need to process two search results and combine them manually (in code). Are there any other suggestions?

I do not want to combine separate / fragment indexes.

+5
source share
3 answers

Lucene has a type IndexReaderto support this layout ParallelReader.

This can be a little tricky to use, since the Lucene document id for the record must be the same in both indices. In practice, this means adding documents in the same order to both indices. I read that in some cases deleting a document and optimizing the index can lead Lucene to redistribute these document identifiers, but I have not experimented to find out if this is true. Additional assistance may be required if existing entries are modified. If only new entries are added, there should be no problem.

" ", " " .

+4

, . , , , . . , (, ), . , k, , :

score(k) = a*tagscore(k)+b*fulltextscore(k)

a b .

. Grant Ingersoll findability .

0

, (, , , ) .

, , , . , .

A common solution to this problem is a two-step approach. First, a query is run against each index to determine how many documents each term contains. Then the results are aggregated and the query is launched again, but this time the frequency of the return document is sent along with it.

As you can imagine, this will not be performed in the same way as querying on a single index, but since nothing is free, I believe that this is a compromise for storing documents on several indexes.

0
source

All Articles