How to run parallel programs in python

I have a python script to run several external commands using the os.subprocess module. But one of these steps takes a lot of time, so I would like to run it separately. I need to run them, verify that they are finished, and then run the following command, which is not parallel. My code looks something like this:

nproc = 24 for i in xrange(nproc): #Run program in parallel #Combine files generated by the parallel step for i in xrange(nproc): handle = open('Niben_%s_structures' % (zfile_name), 'w') for i in xrange(nproc): for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline) handle.close() #Run next step cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name) 
+8
python
source share
3 answers

For your example, you just want to lay out a parallel - you do not need threads for this.

Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm

Collect Popen instances for each process that you spawned, and then wait() to complete them:

 procs = [] for i in xrange(nproc): procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel for p in procs: p.wait() 

You can get away from this (as opposed to using multiprocessing or threading modules), since you are not very interested in having these interactions - you just want os to run them in parallel and be sure they are all finished when you go to combine the results ...

+6
source share

Parallel operations can also be implemented using several processes in Python. I wrote a blog post on this topic a while ago, you can find it here.

http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html

Basically, the idea is to use "workflows" that independently retrieve jobs from the queue and then populate those jobs.

Works well in my experience.

+2
source share

You can do this using threads. This is a very short and (untested) example with a very ugly if-else on what you are actually doing in the stream, but you can write your own working classes.

 import threading class Worker(threading.Thread): def __init__(self, i): self._i = i super(threading.Thread,self).__init__() def run(self): if self._i == 1: self.result = do_this() elif self._i == 2: self.result = do_that() threads = [] nproc = 24 for i in xrange(nproc): #Run program in parallel w = Worker(i) threads.append(w) w.start() w.join() # ...now all threads are done #Combine files generated by the parallel step for i in xrange(nproc): handle = open('Niben_%s_structures' % (zfile_name), 'w') ...etc... 
0
source share

All Articles