What is the difference between ZooKeeper and any distributed Key-Value repositories?

I am new to zookeeper and distributed systems and am learning it myself.

From what I understand at the moment, it seems that ZooKeeper is just a keystore whose keys are paths, and the values ​​are strings that are no different from, say, Redis. (And, apparently, we can use a path separated by a word, like keys in redis.)

So my question is, what is the significant difference between ZooKeeper and another KV distributed store? Why does ZooKeeper use so-called "paths" as keys instead of simple strings?

+8
source share
2 answers

You are comparing the high-level ZooKeeper data model with other key value stores, but this is not what makes it unique. From the point of view of distributed systems, ZooKeeper differs from many other key value stores (especially Redis) because it is highly consistent and can fail, while most of the cluster is connected. In addition, while the data is stored in memory, it is synchronously replicated to most clusters and supported on disk, so after a successful recording, it ensures that the recording will not be lost (missile strike ban). This makes ZooKeeper very useful for storing small amounts of critical states, such as configurations.

Conversely, Redis is not a distributed system and does not provide the same guarantees as ZooKeeper, and many other key value stores that are distributed are ultimately consistent. In other words, there is no guarantee that after a value is written, all other processes in the distributed system can see that value.

Finally, in addition to a file system, such as an interface for storing state, ZooKeeper provides fairly low-level functions on which more complex problems can be solved. Examples of this look at Apache Curator. The curator uses the ZooKeeper ephemeral nodes (nodes that disappear when the client who created them disconnects) to create things like locks and leader choices that are extremely useful for coordinating distributed systems. Thus, from this point of view, the ZooKeeper data model and related functions serve as primitives on which higher-level tools for distributed coordination can be created.

+15
source

You can compare zookeeper with other distributed storage of key values ​​such as etcd and consul. These tools also offer the same benefits of Apache Zookeeper. The main advantage of zookeeper is that it takes care to avoid a deadlock and race conditions in distributed applications. Zookeeper is not only a store of key values. It can also be used for service discovery and centralized maintenance to support configuration information in a distributed application.

The way zookeeper stores its key-value pair is slightly different from other key-value stores, Zookeeper uses the z-node as a key. It looks like a Unix file system tree and starts with a slash (/). It can be permanent or ephemeral. This key-value is supplied through RAM. Each node has its own ACL. Zookeeper stores the transaction log and snapshot for the restored node in the event of an accident. It is designed to function as a fault-tolerant and distributed kv repository, so it should be deployed as a cluster. The zookeeper server group is called the zookeeper ensemble. There is one zookeeper leader server, and the rest are followers. This leader-follower relationship comes from choosing a leader between zk servers in a cluster.

Zookeeper is mainly used in the implementation of HA Hadoop Namenode and YARN resource manager, here it takes care of promoting the active and backup status of these daemons, Kafka is designed to use Zookeeper to store themes and information about offsets.

Zookeeper can also be used as an alternative to etcd in the kubernetes control plane.

0
source

All Articles