Best practices for Solr / Lucene index updated after long rebuild

Question

Best practices for Solr / Lucene index updated after long rebuild

We have a general question about best practice / programming for long-term index recovery. This question is not "solr specific", and also applies to raw Lucene or any other similar indexing tool / library / black box.

Question

What is the best practice for ensuring the Solr / Lucene index is “completely updated” after a long recovery of the index, i.e. if during the 12-hour rebuild of the index, users add / modify / delete records or db files (for example, PDF), how do you guarantee that the rebuild index at the very end "includes" these changes?

Context

Large database and file system (e.g. pdf) indexed in Solr
A multicore instance of solr, where core0 is for search, and all additions / changes / deletions of core1 are for "rebuilding". Core1 is the "temporary core."
Upon completion of the rebuild, we will move core1 to core0, so the search and updates go against the recently restored db

Current approach

The rebuild process requests db and / or bypasses the file system for "all db entries" or "all files"
"" db/pdf, / . (, "select * from element order by element_id". open-i..e , - , . , " " ( ), .
"" : / db, , " "

Solr (.. db) //, db/
( ) : .. /pdf ,

solr - "" core0 core1

+5

indexing lucene solr

user331465 29 . '10 21:46

1

Eric Pugh · Answer 1 · 2010-11-01T14:11:50+0000

.... , core1 (, ) 0 ( "live" "" ).

, , ? live PDF, , , . ... pdf solr-, , . PDF solr, . solr, , , PDF-, . .
(?), Core1, Core0. - . , , "".
, : http://wiki.apache.org/solr/MergingSolrIndexes#Merging_Through_CoreAdmin , , , , ... , , .

, !

Best practices for Solr / Lucene index updated after long rebuild

More articles: