Why does my slr solr index keep growing?

I have a 5-core solr 1.4 master that replicates to another 5-core solr using solr replication as described here . All recordings are performed against the master and are periodically interrupted by subordinates. This is done using the following sequence:

  • Commit each core core
  • Replication on every slave core
  • Optimization on every slave core
  • Commit each slave core

The problem I ran into is that the slave seems to be holding old index files and is taking up more disk space. For example, after 3 repetitions, the main data directory of the main core looks like this:

$ du -sh * 145M index 

But the data directory on the slave kernel looks like this:

 $ du -sh * 300M index 144M index.20100621042048 145M index.20100629035801 4.0K index.properties 4.0K replication.properties 

Here is the contents of index.properties:

 #index properties #Tue Jun 29 15:58:13 CDT 2010 index=index.20100629035801 

And replication.properties:

 #Replication details #Tue Jun 29 15:58:13 CDT 2010 replicationFailedAtList=1277155032914 previousCycleTimeInSeconds=12 timesFailed=1 indexReplicatedAtList=1277845093709,1277155253911,1277155032914 indexReplicatedAt=1277845093709 replicationFailedAt=1277155032914 lastCycleBytesDownloaded=150616512 timesIndexReplicated=3 

Solrconfig.xml for this slave contains a default delete policy:

 [...] <mainIndex> <unlockOnStartup>false</unlockOnStartup> <reopenReaders>true</reopenReaders> <deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> </deletionPolicy> </mainIndex> [...] 

What am I missing?

+4
source share
3 answers

It is useless to commit and optimize slaves. Since all write operations are performed on the master, this is the only place these operations should be performed.

This can be the cause of the problem: as you perform additional commit and optimize on the slaves, it holds more fixed points on the slaves. But this is only an assumption, it should be easier to understand what is happening with your complete solrconfig.xml on both the master and slave devices.

+1
source

Optimization performed on a slave results in a doubling of the index. during optimization, individual index segments will be created to rewrite the original index by the number of segments mentioned during optimization (by default - 1). Best practice is to optimize from time to time, not invoke it anyway (run cron job or something else) and optimize only on master, not on slave. subordinates will receive these new segments through replication. You must complete the transaction on the slave, updating the index will take care of the availability of new documents in the slave after replication.

+1
source

I decided that additional indexes. * directories seem to be left behind when I replicate after a full reboot of the wizard. What I mean by “full reboot” is stopping the wizard, deleting everything in [core] / data / *, restarting (after which solr creates a new index), indexing all of our documents, and then replicating.

Based on some additional testing, I found that it would be safe to delete other index * directories (other than those specified in [core] /data/index.properties). If I do not like this workaround, I can decide to clear the slave index (stop, delete the data / *; start) before replicating for the first time after the wizard is completely rebooted.

0
source

Source: https://habr.com/ru/post/1314264/


All Articles