How to remove dead node from a Cassandra cluster?

  • I have a 12-node cassandra cluster on EC2.
  • Due to some kind of failure, we completely lost one of the node. I mean, the car no longer exists.
  • So, I created a new EC2 instance with different ip and the same token as the dead node, and I also had a backup of the data on this node, so it works fine
  • But the problem is that dead ip nodes are still displayed as unreachable nodes in the cluster being described.
  • Since this node (EC2 instance) no longer exists, I cannot use nodetool debommission or nodetool disablegossip

How can I get rid of this unreachable node

+8
cassandra amazon-ec2 cluster-computing
source share
2 answers

Usually, when you replace node, you want to set the new node token to (failure node token) - 1 and let it boot. Starting with version 1.0, there is now a flag that you can specify when starting to replace the dead node : "cassandra.replace_token =".

Since you have already added a new node with the same token, add an additional step:

  • Move the new node token to (failure node token) - 1 with nodetool move
  • Run nodetool removetoken <failed node token> from one of the top nodes
  • Run nodetool cleanup on each node

These are basically pre 1.0 instructions for replacing a dead node with an extra marker movement.

+5
source share

I had the same problem and resolved it with removenode , which does not require finding and modifying the node token.

First get the node UUID:

 nodetool status DN 192.168.56.201 ? 256 13.1% 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac r1 DN 192.168.56.202 ? 256 12.6% e11d219a-0b65-461e-babc-6485343568f8 r1 UN 192.168.2.91 156.04 KB 256 12.4% e1a33ed4-d613-47a6-8b3b-325650a2bbd4 RAC1 UN 192.168.2.92 156.22 KB 256 13.6% 3a4a086c-36a6-4d69-8b61-864ff37d03c9 RAC1 UN 192.168.2.93 149.6 KB 256 11.3% 20decc72-8d0a-4c3b-8804-cc8bc98fa9e8 RAC1 

As you can see .201 and .202 are also dead on another network. They were changed to .91 and .92 without proper decommissioning and re-commissioning. I worked on installing a network and made some mistakes ...

Secondly, uninstall .201 with the following command:

 nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac 

(in older versions it was nodetool remove ...)

But just like for nodetool removetoken ... , it blocks ... (see the samarth comment in psandord's answer) However, it has a side effect, it puts the UUID in the list of nodes to be removed. Therefore, we can forcefully remove using:

 nodetool removenode force 

(in older versions it was nodetool remove ...)

Now node accepts a command that tells me that it removes the invalid entry:

RemovalStatus: token removal (-9136982325337481102). Waiting for confirmation of replication from [/192.168.2.91,192.168.2.92].

We also see that it is exchanging data with two other nodes that are up and therefore, it takes a little time, but is still pretty fast.

Next, a nodetool status does not show .201 node. I repeat from .202 and now the status is clear.

After that, you can also run the cleanup as indicated in psanford's answer:

 nodetool cleanup 

A cleanup should be performed on all nodes one by one to ensure that the change is fully completed in the account.

+7
source share

All Articles