Cassandra NoHostAvailableException while node is still alive

Question

Cassandra NoHostAvailableException while node is still alive

I have two C * 2.0.2 nodes in one DC (with the default setting in cassandra.yaml) and keys with RF = 2. Two clients are connected to this DC using Datastax Java Driver 1.0.3. Clients read and write data from / to C * with CL = ONE without any errors. But when I close one node, both clients get a huge number of exceptions:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)

After this group of exceptions, clients continue to work successfully with another node that is still alive. What should I do to get a NoHostAvailableException because there is at least one live node at a time and uses CL = ONE?

UPDATE: When I close one of the two nodes, I sometimes see the following exception in the application log:

 [Reconnection-1] [ERROR] [Control connection] Cannot connect to any host, scheduling retry

Why are both nodes unavailable if I close only one snapshot? The second one is still alive, and I can connect to it using cqlsh.

+6

java cassandra datastax-java-driver

tilex Dec 11 '13 at 0:00

source share

1 answer

Wildfire · Answer 1 · 2013-12-12T18:49:35+0000

If you execute a query with CL = ONE, the driver tries to query only one node. Thus, if the request to this node fails (or the node is unavailable), an exception is immediately raised. This behavior is controlled by com.datastax.driver.core.policies.RetryPolicy specified when creating Cluster .

I would say that RetryPolicy , which captures the number of retry attempts, will fit your needs. Unfortunately, Cassandra Driver 1.0.3 is not bundled (I'm not sure about future versions). However, this can be implemented as follows:

 public class MyRetryPolicy implements RetryPolicy { final int attempts; public MyRetryPolicy(int attempts) { this.attempts = attempts; } @Override public RetryDecision onReadTimeout(Query query, ConsistencyLevel cl, int requiredResponses, int receivedResponses, boolean dataRetrieved, int nbRetry) { return (nbRetry >= attempts) ? RetryDecision.rethrow() : RetryDecision.retry(cl) } ... <onWriteTimeout & onUnavailable methods with similar implementation> }

I'm not sure if MyRetryPolicy(2) be sufficient, since I do not delve into the inside of the driver. Perhaps another attempt will be made to send the same request to the same host. You can try MyRetryPolicy(10) , this should at least significantly reduce the number of crashes.

If some failures still remain (for example, 1 out of 1000), it might be worth looking at com.datastax.driver.core.ConvictionPolicy , finding its use and continuing your research.

Cassandra NoHostAvailableException while node is still alive

More articles: