Leader choices for paxos-based key value store

I am going to implement a key value store with several Paxos. I would have several nodes, one of which is the main node. This primary node accepts update requests and replicates the values ​​to the sub nodes.

My question is, how is the primary node (or leader) selected? Can I use the Paxos algorithm? If so, do you think it is necessary to abstract the implementation of paxos into a single unit that can be used not only by the replication unit, but also by the leading election group?

Should I use the node with the smallest id to be a leader? How can I implement master leasing?

Thanks for any answers.

+6
source share
2 answers

Before turning to the real question, I would suggest that for a system similar to Paxos, you do not think of it as a host-slave relationship, but rather an equal relationship. Basic Paxos doesn't even have a leader concept. Multi-paxos uses the leader as a performance optimization, the choice of this leader is part of the protocol.

Multi-Paxos comes down to Paxos below: there is a preparation phase and a adoption phase. The understanding of Multi-Paxos is that as soon as the w630 wins the round of adoption, it simultaneously won the election of leaders, and after that the preparation phase is not required of this leader until he discovers that the other node has taken the lead.


And now some practical tips. I have many years of experience working on several paxos, multi-paxos and other consensus systems.

First, I suggest not using either Paxos or Multi-paxos. Optimizing Paxos systems for better performance when used properly is very difficult, especially if you have these types of questions. Instead, I would look at the implementation of the Raft protocol .

By accepting both protocols as it is right now, the Raft protocol can have much better throughput than Multi-Paxos. The authors of the raft (and others) suggest that the Raft is easier to understand and implement.

You can also explore the use of one of the open source Raft systems. I have no experience with any of them to tell you how easy it is to maintain it. I have heard, however, about the pain in maintaining instances of Zookeeper. (I also heard complaints about Zookeeper's proof of correctness.)

Then it was proved that each consensus protocol can loop forever. Build a timeout mechanism and randomized delays in your system when necessary. This is how practical engineers circumvent theoretical possibilities.

Finally, check your bandwidth needs. If your throughput is high enough, you will need to figure out how to split multiple consensus clusters. And this is a whole "ball of wax."

+7
source

You can solve this problem by using a parallel multi-paxos instance to manage your cluster configuration. Consider a replicated JSON object that is updated via multi-paxos and contains the following information:

  • Serial number
  • Leader ID
  • Lead Leader Lead Timestamp
  • Peer id list

You can use the stock paxos implementation and put all the necessary logic into your network layer:

  • Drop all Prepare and Receive messages received from any node except the leader before the lease expires.
  • A preemptive strike on the leader’s serial number and lease term shortly before the lease expires.
0
source

All Articles