How to configure RabbitMQ Cluster AWS autoscaling

I am trying to move from SQS to RabbitMQ for messaging service. I am looking to build a stable high-availability service. While I'm going with the cluster.

Current implementation , I have three EC2 machines with RabbitMQ with a control plugin installed in AMI, and then I explicitly go to each machine and add

sudo rabbitmqctl join_cluster rabbit@ <hostnameOfParentMachine> 

Everything is set with the HA property, and synchronization works. And a load balancer on top with a designated DNS. While this is working.

Expected Implementation . Create a self-scaling clustered environment in which machines that go up / down must dynamically join / remove the cluster. What is the best way to achieve this? Please, help.

+7
docker amazon-web-services amazon-ec2 cluster-computing rabbitmq
source share
2 answers

I had a similar configuration 2 years ago.

I decided to use amazon VPC , by default in my project there were two instances of RabbitMQ that were always executed and configured in the cluster (called master nodes). The rabbitmq cluster was behind the internal load balancing of the Amazon .

I created an AMI with RabbitMQ and a control plugin configured (called "master-AMI"), and then I set up the autoscaling rules.

if an auto-hazard alarm occurs, a new AMI master starts. This AMI executes the following script for the first time:

 #!/usr/bin/env python import json import urllib2,base64 if __name__ == '__main__': prefix ='' from subprocess import call call(["rabbitmqctl", "stop_app"]) call(["rabbitmqctl", "reset"]) try: _url = 'http://internal-myloadbalamcer-xxx.com:15672/api/nodes' print prefix + 'Get json info from ..' + _url request = urllib2.Request(_url) base64string = base64.encodestring('%s:%s' % ('guest', 'guest')).replace('\n', '') request.add_header("Authorization", "Basic %s" % base64string) data = json.load(urllib2.urlopen(request)) ##if the script got an error here you can assume that it the first machine and then ## exit without controll the error. Remember to add the new machine to the balancer print prefix + 'request ok... finding for running node' for r in data: if r.get('running'): print prefix + 'found running node to bind..' print prefix + 'node name: '+ r.get('name') +'- running:' + str(r.get('running')) from subprocess import call call(["rabbitmqctl", "join_cluster",r.get('name')]) break; pass except Exception, e: print prefix + 'error during add node' finally: from subprocess import call call(["rabbitmqctl", "start_app"]) pass 

The scripts use the HTTP API http://internal-myloadbalamcer-xxx.com:15672/api/nodes "to find the nodes, and then select one and bind the new AMI to the cluster.

As an HA policy, I decided to use this:

 rabbitmqctl set_policy ha-two "^two\." ^ "{""ha-mode"":""exactly"",""ha-params"":2,"ha-sync-mode":"automatic"}" 

Well, connecting is β€œpretty” easy, the problem is that you can remove the node from the cluster.

It is not possible to delete a node based on an autoscale rule because you can receive messages in the queue that you must consume.

I decided to periodically execute the script for two instances of the master node, which:

  • checks the number of messages through the API http: // node: 15672 / api / queues
  • If the number of messages for the entire queue is zero, I can remove the instance from load balancing and then from the rabbitmq cluster.

This is basically what I did, hope this helps.

[EDIT]

I edited the answer as there is this plugin that can help:

I suggest looking at the following: https://github.com/rabbitmq/rabbitmq-autocluster

The plugin has been moved to the official RabbitMQ repository and can easily solve such problems.

+11
source share

We recently had a similar problem.

We tried using https://github.com/rabbitmq/rabbitmq-autocluster , but found that it was too complicated for our use case.

I created a terraform configuration to rotate X RabbitMQ nodes in Y subnets (availability zones) using the Autoscaling Group.

TL; DR https://github.com/ulamlabs/rabbitmq-aws-cluster

The configuration creates an IAM role to allow nodes to automatically discover all other nodes in the Autoscaling group.

0
source share

All Articles