Akka 2.2 ClusterSingletonManager sends HandOverToMe to [None]

I am using the Akka 2.2 contrib ClusterSingletonManager to ensure that there is always one and only one specific type of actor (master) in the cluster. Nevertheless, I observed strange behavior (which, incidentally, can be expected, but I can not understand why). Whenever the wizard leaves the cluster and connects later, the following sequence of actions occurs:

 [INFO] [04/30/2013 17:47:35.805] [ClusterSystem-akka.actor.default-dispatcher-9] [akka://ClusterSystem/system/cluster/core/daemon] Cluster Node [akka.tcp:// ClusterSystem@127.0.0.1 :2551] - Welcome from [akka.tcp:// ClusterSystem@127.0.0.1 :2552] [INFO] [04/30/2013 17:47:48.703] [ClusterSystem-akka.actor.default-dispatcher-8] [akka://ClusterSystem/user/singleton] Member removed [akka.tcp:// ClusterSystem@127.0.0.1 :52435] [INFO] [04/30/2013 17:47:48.712] [ClusterSystem-akka.actor.default-dispatcher-2] [akka://ClusterSystem/user/singleton] ClusterSingletonManager state change [Start -> BecomingLeader] [INFO] [04/30/2013 17:47:49.752] [ClusterSystem-akka.actor.default-dispatcher-9] [akka://ClusterSystem/user/singleton] Retry [1], sending HandOverToMe to [None] [INFO] [04/30/2013 17:47:50.850] [ClusterSystem-akka.actor.default-dispatcher-21] [akka://ClusterSystem/user/singleton] Retry [2], sending HandOverToMe to [None] [INFO] [04/30/2013 17:47:51.951] [ClusterSystem-akka.actor.default-dispatcher-20] [akka://ClusterSystem/user/singleton] Retry [3], sending HandOverToMe to [None] [INFO] [04/30/2013 17:47:53.049] [ClusterSystem-akka.actor.default-dispatcher-3] ... [INFO] [04/30/2013 17:48:10.650] [ClusterSystem-akka.actor.default-dispatcher-21] [akka://ClusterSystem/user/singleton] Retry [20], sending HandOverToMe to [None] [INFO] [04/30/2013 17:48:11.751] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] Timeout in BecomingLeader. Previous leader unknown, removed and no TakeOver request. [INFO] [04/30/2013 17:48:11.752] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] Singleton manager [akka.tcp:// ClusterSystem@127.0.0.1 :2551] starting singleton actor [INFO] [04/30/2013 17:48:11.754] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] ClusterSingletonManager state change [BecomingLeader -> Leader] 

Why is he trying to send HandOverToMe to [None] ? It will take about 20 seconds (20 attempts) until it becomes a new leader, although in this particular situation the previous one was well known ...

+4
source share
1 answer

I'm not sure if this will answer your question, but looking at the source code for the ClusterSingletonManager , you can see the chain of events that leads to this scenario. This class uses the logic of the final state of the machine in Akka, and the behavior that you see is triggered due to a state transition from Start -> BecomingLeader . First view the Start state:

 when(Start) { case Event(StartLeaderChangedBuffer, _) ⇒ leaderChangedBuffer = context.actorOf(Props[LeaderChangedBuffer].withDispatcher(context.props.dispatcher)) getNextLeaderChanged() stay case Event(InitialLeaderState(leaderOption, memberCount), _) ⇒ leaderChangedReceived = true if (leaderOption == selfAddressOption && memberCount == 1) // alone, leader immediately gotoLeader(None) else if (leaderOption == selfAddressOption) goto(BecomingLeader) using BecomingLeaderData(None) else goto(NonLeader) using NonLeaderData(leaderOption) } 

The part you are looking at:

  else if (leaderOption == selfAddressOption) goto(BecomingLeader) using BecomingLeaderData(None) 

It seems to me that this fragment says: "If I am a leader, change the beginning to become a leader with None, as an option of the previous Leader"

Then, if you look at the state of the BecomingLeader :

 when(BecomingLeader) { ... case Event(HandOverRetry(count), BecomingLeaderData(previousLeaderOption)) ⇒ if (count <= maxHandOverRetries) { logInfo("Retry [{}], sending HandOverToMe to [{}]", count, previousLeaderOption) previousLeaderOption foreach { peer(_) ! HandOverToMe } setTimer(HandOverRetryTimer, HandOverRetry(count + 1), retryInterval, repeat = false) } else if (previousLeaderOption forall removed.contains) { // can't send HandOverToMe, previousLeader unknown for new node (or restart) // previous leader might be down or removed, so no TakeOverFromMe message is received logInfo("Timeout in BecomingLeader. Previous leader unknown, removed and no TakeOver request.") gotoLeader(None) } else throw new ClusterSingletonManagerIsStuck( s"Becoming singleton leader was stuck because previous leader [${previousLeaderOption}] is unresponsive") } 

This is the block that continues to repeat this message that you see in the log. Basically, it is like trying to get the previous leader to transfer responsibility, not knowing who the previous leader was, because in the transition state he went to None as the previous leader. Million-dollar question: "If he does not know who the previous leader is, why continue trying to transfer services that will never succeed?"

+5
source

All Articles