Asynchronous processing of processes: a design issue - celery or twisted

All: I'm looking for input / guide and design ideas. My goal is to find a reliable but reliable way to get the XML payload from the HTTP POST (no problem with this part), parse it and build a relatively lengthy process asynchronously.

The initiative process has a processor intensity and lasts about three minutes. At first, I do not expect a big load, but there is a certain possibility that I will need to scale it horizontally across the servers, as the traffic, hopefully, increases.

I really like the Celery / Django stack for this use: it is very intuitive and has a built-in infrastructure to accomplish exactly what I need. I started this way with zeal, but soon discovered that my small 512 MB RAM cloud server has only 100 MB of free memory, and I began to feel like I had a problem when I went live with all my processes with a full tilt . In addition, he received several moving parts: RabbitMQ, MySQL, cerleryd, ligthttpd and the django container.

I can completely increase the size of my server, but I hope that my costs are minimized at this early stage of this project.

As an alternative, I am considering using twisted processes to manage processes, as well as a promising broker for remote systems, if necessary. But for me, at least while the twisted one is brilliant, it seems to me that I subscribe to a lot along this path: writing protocols, callback management, tracking work states, etc. The advantages here are pretty obvious - excellent performance, much less moving parts and less memory (note: I need to check some of the memory). I am greatly distorted for Python for this - it is much nicer for me than alternatives :)

I would really appreciate any point of view on this. I'm worried about getting started on the wrong track, and reusing it later with production traffic will be painful.

Matt

+6
python asynchronous django twisted
source share
3 answers

On my system, RabbitMQ works with pretty reasonable defaults, it uses about 2 MB of RAM. Celeryd uses a little more, but not an excessive amount.

In my opinion, the overhead of RabbitMQ and celery is pretty much negligible compared to the rest of the stack. If you process tasks that will take several minutes, these tasks are what will suppress your 512 MB server as soon as your traffic grows, not RabbitMQ. Starting with RabbitMQ and Celery, you have at least set you up to scale these jobs horizontally, so you're definitely on the right track.

Of course, you could write your own control over work at Twisted, but I don’t see him gaining a lot. Twisted has good performance, but I would not expect it to get ahead of RabbitMQ enough to justify the time and potential for introducing errors and architectural restrictions. Basically, it seems like the wrong place to worry about optimization. Spend the time spent re-recording RabbitMQ and working to reduce these three-minute tasks by 20% or something like that. Or just spend an extra $ 20 a month and double your opportunity.

+5
source share

I will answer this question as if I were the one who did the project, and I hope this can give you some idea.

I am working on a project that will require the use of a queue, a web server for a public web application, and several working clients.

The idea is for the web server to work continuously (no need for a very powerful machine here). However, the work is handled by these work clients, who are more powerful machines that you can start and stop as you wish. The job queue will also be on the same computer as the web application. When the job is inserted into the queue, the process that starts the job clients will take effect and start the first client. Using a load balancer that can start new servers as the load grows, I don’t have to worry about managing the number of servers that process jobs in the queue. If there are no jobs in the queue in the queue, all job clients can be completed.

I suggest using a similar setting. You do not want the task to affect the performance of your web application.

0
source share

I will add one more possibility: using Redis. I am currently using redis with garbled: I am distributing the work to the worker. They do the work and return the result asynchronously.

The List type is very useful: http://www.redis.io/commands/rpoplpush

Thus, you can use the reliable queue template to send work and a process that blocks / waits until it has a new job (new message in the queue.

You can use several workers in one queue.

Redis has low memory printing, but be careful with the number of messages waiting, which will increase the memory that Redis uses.

0
source share

All Articles