Best practices for Solr / Lucene index updated after long rebuild

We have a general question about best practice / programming for long-term index recovery. This question is not "solr specific", and also applies to raw Lucene or any other similar indexing tool / library / black box.

Question

What is the best practice for ensuring the Solr / Lucene index is “completely updated” after a long recovery of the index, i.e. if during the 12-hour rebuild of the index, users add / modify / delete records or db files (for example, PDF), how do you guarantee that the rebuild index at the very end "includes" these changes?

Context

  • Large database and file system (e.g. pdf) indexed in Solr
  • A multicore instance of solr, where core0 is for search, and all additions / changes / deletions of core1 are for "rebuilding". Core1 is the "temporary core."
  • Upon completion of the rebuild, we will move core1 to core0, so the search and updates go against the recently restored db

Current approach

  • The rebuild process requests db and / or bypasses the file system for "all db entries" or "all files"
  • "" db/pdf, / . (, "select * from element order by element_id". open-i..e , - , . , " " ( ), .
  • "" : / db, , " "

  • Solr (.. db) //, db/
  • ( ) : .. /pdf ,

  • solr - "" core0 core1

+5
1

.... , core1 (, ​​ ) 0 ( "live" "" ).

, !

+1

All Articles