We have a general question about best practice / programming for long-term index recovery. This question is not "solr specific", and also applies to raw Lucene or any other similar indexing tool / library / black box.
Question
What is the best practice for ensuring the Solr / Lucene index is “completely updated” after a long recovery of the index, i.e. if during the 12-hour rebuild of the index, users add / modify / delete records or db files (for example, PDF), how do you guarantee that the rebuild index at the very end "includes" these changes?
Context
- Large database and file system (e.g. pdf) indexed in Solr
- A multicore instance of solr, where core0 is for search, and all additions / changes / deletions of core1 are for "rebuilding". Core1 is the "temporary core."
- Upon completion of the rebuild, we will move core1 to core0, so the search and updates go against the recently restored db
Current approach
- The rebuild process requests db and / or bypasses the file system for "all db entries" or "all files"
- "" db/pdf, / . (, "select * from element order by element_id". open-i..e , - , . , " " ( ), .
- "" : / db, , " "
- Solr (.. db) //, db/
- ( ) : .. /pdf ,