Is celery suitable for use with many small distributed systems?

Question

Is celery suitable for use with many small distributed systems?

I am writing some kind of software that will manage several hundred small systems in the field during intermittent 3G (or similar).

The main database will have to send tasks to the systems in the field (for example, “report on your status”, “update software”, etc.), and the systems in the field will need to send tasks back to the server (for example, “failure detected” , "here are some data," etc.).

I spent some time on Celery , and it seems to be perfect: celeryd works at home to collect tasks for systems in the field, celeryd working in field systems could collect tasks for the server, and these tasks could be exchanged when clients will become available.

So, is celery good for this problem? In particular:

Most tasks will be directed to an individual employee (for example, "send the get_status task to" system51 ") - will this be a problem?
Does it handle gracefully adverse network conditions (e.g., flexible connections)?
What functionality is available only if RabbitMQ is used as a backend? (I would prefer not to run RabbitMQ on field systems)
Is there any other reason why celery can make my life more difficult if I use it as I described?

Thanks!

(it would be advisable to assume that celery went too far, but there are other reasons that would make my life easier, so I would like to consider this)

+6

python celery

David wolever Oct 3 '10 at 0:13

source share

2 answers

I would probably set up a web service (django) to accept requests. The web service could do the job of validating requests and rejecting bad requests. Then celery can just do the job.

This will require remote web service polling devices to see if their jobs have been completed. This may or may not be suitable, depending on what exactly you are doing.

+1

Seth Oct 3 '10 at 0:47

source share

asksol · Accepted Answer · 2010-10-03T11:30:15+0000

Most tasks will be directed to an individual employee (for example, “send 'Get_status job to' system51”) - will this be a problem?

Not at all. Just create a queue for each employee, for example. let's say each node listens to a loop with a loop called default , and each node has its own queue, named after node:

 (a)$ celeryd -n a.example.com -Q default,a.example.com (b)$ celeryd -n b.example.com -Q default,b.example.com (c)$ celeryd -n c.example.com -Q default,c.example.com

Routing a task directly to node is simple:

 $ get_status.apply_async(args, kwargs, queue="a.example.com")

or by configuration using Router :

 # Always route "app.get_status" to "a.example.com" CELERY_ROUTES = {"app.get_status": {"queue": "a.example.com"}}

Does it gracefully handle adverse network conditions (for example, for example, connections die)?

The employee gracefully recovers from broker connection failures. (at least from RabbitMQ, I'm not sure about all the other backends, but it's easy to check and fix (you only need to add related exceptions to the list)

For the client, you can always resubmit the task if the connection does not work, or you can configure HA using RabbitMQ: http://www.rabbitmq.com/pacemaker.html

What functionality is available only if RabbitMQ is used as a backend? (I would prefer not to run RabbitMQ on field systems)

Remote control commands, and only "direct" exchanges are supported (not a "topic" or "fork"). But this will be supported in Kombu (http://github.com/ask/kombu) .

I would seriously rethink the use of RabbitMQ. Why do you think this is not suitable? IMHO I would not look elsewhere for such a system (except for ZeroMQ, if the system is temporary and you do not need persistence of messages).

Is there another reason celery could make my life difficult if I use it as I described?

I can't think of anything from what you described above. Since the concurrency model is multiprocessor, this requires some memory (I'm working on adding support for thread pools and event pools, which may help in some cases).

it would be advisable to assume that celery is excessive, but there are other reasons why this would make my life easier, so I would like to consider it)

In this case, I think you use the word overkill lightly. It really depends on how much code and tests you need to write without it. I think it’s better to improve an already existing common solution, and theoretically it sounds like this should work well for your application.

Is celery suitable for use with many small distributed systems?

More articles: