Distributed System: Leader Election

I am currently working on a distributed system where we need to make some kind of leader election . The problem is that we would like to avoid all computers know each other, but only the leader. Is there a quick way we can use, for example, Broadcast to achieve what we want?

Or do we just need to know at least one to make good leadership choices?

All computers are assumed to be on the same subnet.

Thank you for your help.

+7
source share
4 answers

The problem is that we would like to avoid all computers know each other, but only the leader.

Leadership elections are the problem of choosing the only leader out of many potential candidate candidates. Look at this as having two essential properties: resilience and safety. Here, vitality will mean "most of the time, there is a leader," while security will mean "there is either zero or one leader." Let's look at how we enable this security feature in your example using broadcast.

We choose a simple (broken) algorithm, assuming that each node has a unique identifier. Each node translates its identifier and listens. Upon receipt of a higher identifier than his own, he ceases to participate. If he receives a lower identifier than his own, he sends his own transmissions again. Accepting a synchronous network, the last identifier that everyone receives is the leader identifier. Now imagine the network partition. The protocol will happily continue on both sides of the section, and two leaders will be elected.

This applies to this broken protocol, but also applies to all possible protocols. How do you tell the difference between nodes that you cannot communicate with and nodes that do not exist if you do not know (at least) how many nodes exist? So, there is the first result of security: you need to know how many nodes exist, or you cannot guarantee that there is only one leader.

Now let me relax our security constraint as probabilistic: "there may be zero or more leaders, but most often there is one." This makes the problem acceptable, and the widely used solution is gossip (epidemic protocols). For example, see the Gossip-Style Failure Detection Service , which discusses a variant of this exact problem. This is mainly about probabilistically correct fault detection and enumeration, but if you can do this, you can make probabilistically correct leader choices.

As far as I can tell, you cannot have safe, non-probabilistic election of leaders in common networks without listing at least the participants.

+7
source

As one of the interesting solutions of "distributed mechanics", I saw the last time I recommend the Apache zookeeper project. This is an open source solution, so at least you should be able to get a couple of ideas from there. It is also developing intensively, so perhaps you can reuse it as part of your solution.

ZooKeeper is a centralized service for supporting information configuration, naming, providing distributed synchronization, and providing group services. All these types of services are used in one form or another by distributed applications. Each time they implemented a lot of work, which consists in fixing errors and which are inevitable. Due to the complexity of introducing these types of services, applications initially usually save on them, which makes them fragile when there are changes and difficult to manage. Even when everything is done correctly, different implementations of these services make management difficult when applications are deployed.

+1
source

I would recommend JGroups solve this problem - assuming you are building a system on top of the JVM.

http://www.jgroups.org/

Use LockService to ensure that only 1 node in the cluster is the leader. JGroups can be configured to use Peer Lock or Central Lock - or should work in your case.

See http://withmeta.blogspot.com/2014/01/leader-election-problem-in-elastic.html for implementing Clojure or http://javabender.blogspot.com.au/2012/01/jgroups-lockservice -example.html for Java.

+1
source

A practical solution is to use the database as a β€œmeeting point”.

This solution is very convenient if you are already using SQL DB, all that is required is a new table. If you use a database cluster, you can take advantage of its high availability.

Here is a table that uses my implementation:

CREATE TABLE Lease ( ResourceId varchar(64), Expiration datetime, OwnerId varchar(64), PRIMARY KEY(ResourceId) ); 

The idea is to have a string on a shared resource. Leaders will compete for the same line.

I really like the simplified C # implementation:

 class SqlLease { private ISqlLeaseDal _dal; private string _resourceId; private string _myId; public SqlLease(ISqlLeaseDal dal, string resourceId) { _dal = dal; _resourceId = resourceId; _myId = Guid.NewGuid().ToString(); } class LeaseRow { public string ResourceId {get; set;} public string OwnerId {get; set;} public Datetime Expiration {get; set;} public byte[] RowVersion {get; set;} } public bool TryAcquire(Datetime expiration) { expiration = expiration.ToUniversalTime(); if (expiration < DateTime.UtcNow) return false; try { var row = _dal.FindRow(_resourceId); if (row != null) { if (row.Expiration >= DateTime.UtcNow && row.OwnerId != _myId) { return false; } row.OwnerId = _myId; row.Expiration = expiration; _dal.Update(row); return true; } _dal.Insert(new LeaseRow { ResourceId = _resourceId, OwnerId = _myId, Expiration = expiration, }); return true; } catch (SqlException e) { if (e.Number == 2601 || e.Number == 2627) return false; throw e; } catch (DBConcurrencyException) { return false; } } } 

The ISqlLeaseDal class encapsulates an SQL connection and low table access.

Use a reasonable time frame. Remember that if the current leader fails, the resource will be blocked until the expiration date.

0
source

All Articles