Can someone explain what happens behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirror mode when published to a sub node?
This blog describes what is happening.
But what happens when I publish a sub node? Will this node do the same thing that sends a message to the host first?
The message will be redirected to the main queue - that is, the node on which the queue was created.
But how do you deal with a master failure? Then one of the slaves will be elected as a master, so you need to know what to connect to?
Again, described here. Essentially, you need a separate service that polls RabbitMQ and determines if the nodes are waiting or not. For this, RabbitMQ provides a management API . Your publications and consuming applications must access this service either directly or through a reciprocal data store to determine which node is the correct one to publish or use.
The connection with the rabbit is permanent, so if it fails, you must recreate it. In addition, you must resend the messages in this case, otherwise you will lose them.
You need to subscribe to disconnected events in order to respond to disconnected connections. You will need to create some level of redundancy on the client to ensure that messages are not lost. I propose, as above, to introduce a service specifically designed for the RabbitMQ survey. You can try to post a message with the last active connection known, and if that fails, the client can request a monitor service to update the list of the RabbitMQ cluster. Assuming that there is at least one active node, the client can establish a connection with it and publish the message successfully.
Even so, all messages can be lost, because they can be in the way when I kill node
There are certain edge cases that you cannot cover with redundant capabilities, and none of them can use RabbitMQ. For example, when a message is queued and the HA policy invokes a background process to copy the message to a node backup. During this process, it is likely that the message will be lost before it is saved to the node backup. If the active node terminates immediately, the message will be lost forever. There is nothing that could be done about this. Unfortunately, when we get to the level of actual bytes moving along the wire, there is a limit on the number of guarantees that we can build.
therefore, consumer applications will need to deduplicate or process incoming messages using an idempotent method.
You can handle this in several ways. For example, setting message-ttl to a relatively low value ensures that duplicate messages will not remain in the queue for extended periods of time. You can also mark each message with a unique link and check this link at the consumer level. Of course, for this you need to store a cache of processed messages to compare incoming messages; the idea is that if a previously processed message arrives, its tag will be cached by the consumer, and the message can be ignored.
One thing I would like to emphasize in AMQP and Queue in general is that your infrastructure provides tools, but not an entire solution. You must address these gaps depending on the needs of your business. Often the best solution is through trial and error. Hope my suggestions are helpful. Here I will talk about several RabbitMQ design solutions, including the problems you mentioned, here if you are interested.