MongoDB SECONDARY becomes a RESTORE at night

I run the usual MongoDB replica set, consisting of 3 members (member1 in datacenter A, member2 and member3 in datacenter B). member1 is the current PRIMARY, and I add elements 2 and 3 through rs.add() . They perform their initial synchronization and soon become SECONDARY. All is well throughout the day, and the replication delay for both participants is 0 seconds until 2am.

Now: every night at 2 a.m., both members go into RECOVERING state and stop replication altogether, which leads to a delay in clock replication when I look at rs.printSlaveReplicationInfo() in the morning hours. At about 2 a.m., I am not aware of massive inserts or maintenance tasks.

I get the following journal entries in PRIMARY:

 2015-10-09T01:59:38.914+0200 [initandlisten] connection accepted from 192.168.227.209:59905 #11954 (37 connections now open) 2015-10-09T01:59:55.751+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan 2015-10-09T01:59:55.869+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan 2015-10-09T01:59:55.870+0200 [conn11111] getmore local.oplog.rs cursorid:1155433944036 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:32168 nreturned:0 reslen:20 134ms 2015-10-09T01:59:55.872+0200 [conn11111] end connection 192.168.227.209:58972 (36 connections now open) 

And, more interestingly, I get the following log entries on both SECONDARY:

 2015-10-09T01:59:55.873+0200 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one 2015-10-09T01:59:55.873+0200 [rsBackgroundSync] replSet syncing to: member1:27017 2015-10-09T01:59:56.065+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from member1:27017 2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet our last optime : Oct 9 01:59:23 5617035b:17f 2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet oldest at member1:27017 : Oct 9 01:59:23 5617035b:1af 2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember 2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up 2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet RECOVERING 

What is also striking is the beginning of the ascent β€œevery time is reset” every night at about 2 in the morning:

 configured oplog size: 990MB log length start to end: 19485secs (5.41hrs) oplog first event time: Fri Oct 09 2015 02:00:33 GMT+0200 (CEST) oplog last event time: Fri Oct 09 2015 07:25:18 GMT+0200 (CEST) now: Fri Oct 09 2015 07:25:26 GMT+0200 (CEST) 

I'm not sure if this somehow correlates with the problem. I'm also curious that such a small delay ( Oct 9 01:59:23 5617035b:17f ↔ Oct 9 01:59:23 5617035b:1af ) allows members to become obsolete.

Could this be a server time issue (VM host) or is it something completely different? (Why does the first oplog event "reload" every night rather than "switch" to a timestamp, for example, NOW minus 24 hours?) What can I do to investigate and avoid?

+7
time mongodb replication database-replication
source share
1 answer

Increasing the size of oplog should solve this problem (according to our comments).

Some links for others that encounter this problem

+2
source share

All Articles