We have used Rabbit successfully for a year. Recently been updated to version 2.6.1, because we want to use clusters with replicated message queues.
My testing was struck by a cryptic behavior that smells like a Rabbit bug. The test that expands this works with the <node cluster. Both nodes work v2.6.1. Both nodes have a disk. Both nodes work on Mac OS, although I doubt it is appropriate.
I also run Alice on node, which runs the test. The test uses it to programmatically make stop_app on one of the nodes, because the test tries to verify that if the cluster master fails and the subordinate is upgraded to take its place, we do not lose messages.
So, the test has a small pool of threads, which has tasks that periodically 1) publish messages and 2) switch the state of the rabbit master node (stop if it starts, start if it is stopped). Other threads consume messages from queues.
I use publisher confirmation, and I also confirm consumer messages (using autoAck = false for channel.basicConsume ()).
When the node wizard is stopped, I see manufacturers and consumers catch a ShutdownSignalException. They handle this by trying to connect to the cluster. It works great. When reconnected, they continue their business.
Sometimes I see that the consumer has successfully pulled the message from the broker and calls channel.basicAck () when it receives this ShutdownSignalException.
Later, when the consumer connected again, he pulled out the same message again. (The message bodies are marked with UUIDs, so I know that this is one and the same.) This time, when the consumer tries to receive the basicAck () message, he again receives a ShutdownSignalException, but it has the following text: reply-text = PRECONDITION_FAILED - unknown delivery tag 7 ".
In fact, this is the same delivery tag that was offered to the buyer by the broker before the master went down and the consumer connected again.
Googling suggests that this event means that the consumer is trying to receive the same message more than once.
But how can this be so? If the first ack succeeds, the message should be removed from the broker's queue, and the consumer should not see the same message again.
However, if the first ack failed, then the consumer should not be flagged for trying to resend the message.
Has anyone seen this before? It smells like a mistake in the replicated Rabbit queues, but I'm still new to Rabbit, and so I'm ready to believe that there is a subtlety to using a cluster broker that I haven't collected yet!
Thanks, -Steve