Cassandra with two nodes with redundancy

I installed two servers to run Cassandra by following the documentation on the DataStax website. My current setting is

1 seed seed (set in both barley)

When starting up, both nodes are up (when testing through nodetool), and both seem to replicate the data correctly, but I noticed that when I flush the seed of the node, the other node does not allow (neither through their API, nor when connecting to cqlsh), which is a problem.

My requirement is to have two servers that are perfect replicas of each other, and in case one server is temporarily unavailable (due to disk failures), the other server can take over as long as the server is down will not return to the network.

Given this requirement, I have the following questions:

  • Do I need to configure both nodes as "seed" nodes?
  • How can I make sure everything is replicated on both servers? Does this happen automatically or is there some kind of setup I need to set?

Thank you very much in advance,

+8
cassandra
source share
1 answer

Cassandra does not perform master-slave replication. There is no master in cassandra. Rather, the data is distributed across the cluster. The distribution mechanism depends on a number of factors.

Data is stored on nodes in partitions. Remember cassandra is a partitioned string repository? This is where the sections go. Data is stored in sections. All rows for a partition are stored together in one node (and replicas). How many replicas depends on the replication rate of the table. If the replication rate is 3 for the table, each section for this table (and as such, all rows in this section) is stored in two additional replicas. This is how to say: "I want 3 copies of this data."

Clients can indicate a level of consistency (CL) during recording. This is the number of nodes that should confirm the success of the recording. Clients can also specify CL for reading. Cassandra sends read requests to n = CL nodes and takes the last value as the result of the request.

By configuring the read and write CLI, you control consistency. If Read CL + Write CL> Replication Rate (RF), you will get full consistency.

In terms of fault tolerance, you can configure CL and RF to be what you need. For example, if you have RF = 3, read CL = 2, write CL = 2, then you will get full consistency, and you can move one node down. For RF = 5, read CL = 3, write CL = 3, you have the same thing, but you can move 2 nodes down.

Two cluster node, this is actually not a good idea. You can set RF = 2 (all replicated data), write CL = 2 and read CL = 1. However, this will mean that if node does not work, you can read, but not write. You can set read CL = 2 and write CL = 1, and in this case, if the node is omitted, you can write, but not read. In reality, you should go for at least 5 (at least 4) nodes with RF = 3. All of this is lower, and you are asking for trouble.

+16
source share

All Articles