In Django, how to call a subprocess with slow startup time

Suppose you run Django on Linux, and you have a view, and you want this view to return data from the cmd subprocess that works with the file created by the view, for example liko:

def call_subprocess(request): response = HttpResponse() with tempfile.NamedTemporaryFile("W") as f: f.write(request.GET['data']) # ie some data # cmd operates on fname and returns output p = subprocess.Popen(["cmd", f.name], stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() response.write(p.out) # would be text/plain... return response 

Now suppose cmd has a very slow startup time, but a very fast run time, and it does not have a daemon mode. I would like to improve the response time of this point of view.

I would like the whole system to work much faster by running several cmd instances in the workers pool, making them wait for input and have call_process ask one of these processes in the worker pool to process the data.

These are really 2 parts:

Part 1. The function that calls cmd and cmd is waiting for input. This can be done using pipes, i.e.

 def _run_subcmd(): p = subprocess.Popen(["cmd", fname], stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() # write 'out' to a tmp file o = open("out.txt", "W") o.write(out) o.close() p.close() exit() def _run_cmd(data): f = tempfile.NamedTemporaryFile("W") pipe = os.mkfifo(f.name) if os.fork() == 0: _run_subcmd(fname) else: f.write(data) r = open("out.txt", "r") out = r.read() # read 'out' from a tmp file return out def call_process(request): response = HttpResponse() out = _run_cmd(request.GET['data']) response.write(out) # would be text/plain... return response 

Part 2. Recruitment of workers working in the background who are waiting for data. those. we want to expand the above so that the subprocess already works, for example. when a Django instance or this process_ call is initialized , a collection of these workers is created

 WORKER_COUNT = 6 WORKERS = [] class Worker(object): def __init__(index): self.tmp_file = tempfile.NamedTemporaryFile("W") # get a tmp file name os.mkfifo(self.tmp_file.name) self.p = subprocess.Popen(["cmd", self.tmp_file], stdout=subprocess.PIPE, stderr=subprocess.PIPE) self.index = index def run(out_filename, data): WORKERS[self.index] = Null # qua-mutex?? self.tmp_file.write(data) if (os.fork() == 0): # does the child have access to self.p?? out, err = self.p.communicate() o = open(out_filename, "w") o.write(out) exit() self.p.close() self.o.close() self.tmp_file.close() WORKERS[self.index] = Worker(index) # replace this one return out_file @classmethod def get_worker() # get the next worker # ... static, incrementing index 

There should be some initialization of the workers, for example:

 def init_workers(): # create WORKERS_COUNT workers for i in xrange(0, WORKERS_COUNT): tmp_file = tempfile.NamedTemporaryFile() WORKERS.push(Worker(i)) 

Now, what I have above becomes something like:

 def _run_cmd(data): Worker.get_worker() # this needs to be atomic & lock worker at Worker.index fifo = open(tempfile.NamedTemporaryFile("r")) # this stores output of cmd Worker.run(fifo.name, data) # please ignore the fact that everything will be # appended to out.txt ... these will be tmp files, too, but named elsewhere. out = fifo.read() # read 'out' from a tmp file return out def call_process(request): response = HttpResponse() out = _run_cmd(request.GET['data']) response.write(out) # would be text/plain... return response 

Now the questions are:

  • Will this work? (I just printed this at the top of my head in StackOverflow, so I'm sure there are problems, but conceptually, will this work)

  • What are the problems to search for?

  • Are there any better alternatives to this? for example Can threads work just as well (it's Debian Lenny Linux)? Are there libraries that handle concurrent worker pool processes like this?

  • Are there any interactions with Django that I should be aware of?

Thanks for reading! I hope you find this problem interesting as I do.

Brian

+4
source share
3 answers

It may seem like I'm kicking this product, since this is the second time I responded with a recommendation.

But it looks like you need a Message Queing service, in particular a distributed message queue.

Here's how it will work:

  • Your Django application for CMD application
  • CMD is added to the queue
  • CMD gets into a few jobs.
  • Running and results returning upstream

Most of this code exists, and you do not need to create your own system.

Look at the celery that was originally built with Django.

http://www.celeryq.org/ http://robertpogorzelski.com/blog/2009/09/10/rabbitmq-celery-and-django/

+3
source

Issy already mentioned Celery, but since comments do not work well with code samples, I will answer as an answer instead.

You should try to use celery in sync with the AMQP results repository. You can extend the actual execution to another process or even to another machine. Running synchronously in celery is easy, for example:

 >>> from celery.task import Task >>> from celery.registry import tasks >>> class MyTask(Task): ... ... def run(self, x, y): ... return x * y >>> tasks.register(MyTask) >>> async_result = MyTask.delay(2, 2) >>> retval = async_result.get() # Now synchronous >>> retval 4 

The AMQP results repository makes sending the result very fast, but it is only available in the current development version (in code freeze mode 0.8.0)

+3
source

How about β€œdemonizing” a subprocess call using python-daemon or its successor grizzled .

0
source

All Articles