Why should I use .wait () with the python subprocess module?

I am running a Perl script through a subprocess module in Python on Linux. The function that runs the script is called several times with variable input.

def script_runner(variable_input): out_file = open('out_' + variable_input, 'wt') error_file = open('error_' + variable_input, 'wt') process = subprocess.Popen(['perl', 'script', 'options'], shell=False, stdout=out_file, stderr=error_file) 

However, if I run this function, say, twice, the execution of the first process will stop when the second process begins. I can get the desired behavior by adding

 process.wait() 

after calling the script, so I'm not stuck. However, I want to find out why I cannot start the script process as many times as I want, and let the script execute these calculations in parallel, without waiting for it to complete each run.

UPDATE

The culprit was not so exciting: the perl script used a shared file that was rewritten for each execution.

However, the lesson I learned from this was that the garbage collector does not delete this process after it starts, because it did not affect my script as soon as I parsed it.

+6
python subprocess
source share
4 answers

If you use Unix and want to run many processes in the background, you can use subprocess.Popen as follows:

x_fork_many.py:

 import subprocess import os import sys import time import random import gc # This is just to test the hypothesis that garbage collection of p=Popen() causing the problem. # This spawns many (3) children in quick succession # and then reports as each child finishes. if __name__=='__main__': N=3 if len(sys.argv)>1: x=random.randint(1,10) print('{p} sleeping for {x} sec'.format(p=os.getpid(),x=x)) time.sleep(x) else: for script in xrange(N): args=['test.py','sleep'] p = subprocess.Popen(args) gc.collect() for i in range(N): pid,retval=os.wait() print('{p} finished'.format(p=pid)) 

The result looks something like this:

 % x_fork_many.py 15562 sleeping for 10 sec 15563 sleeping for 5 sec 15564 sleeping for 6 sec 15563 finished 15564 finished 15562 finished 

I'm not sure why you get weird behavior when you don't call .wait() . However, the script above suggests (at least on unix) that storing subprocess.Popen(...) processes in a list or set is not required. Whatever the problem, I don’t think it has to do with garbage collection.

PS. Perhaps your perl scripts conflict in some way, which leads to one of them ending in an error when the other starts up. Have you tried to run multiple calls in a perl script from the command line?

+2
source share

You must call wait () to ask for a "wait" for the end of your popen.

How to popen to execute a perl script in the background, if you do not wait (), it will be stopped at the end of the “process” process of the end of life, which is located at the end of script_runner.

+1
source share

As ericdupo says, the task is destroyed because you are overwriting the process variable with a new Popen object, and since there are no more references to your previous Popen object, it is destroyed by the garbage collector, you can prevent this by storing the link to your objects somewhere, for example, a list:

 processes = [] def script_runner(variable_input): out_file = open('out_' + variable_input, 'wt') error_file = open('error_' + variable_input, 'wt') process = subprocess.Popen(['perl', 'script', 'options'], shell=False, stdout=out_file, stderr=error_file) processes.append(process) 

This should be enough to prevent the destruction of your previous Popen object Popen

+1
source share

I think you want to do

 list_process = [] def script_runner(variable_input): out_file = open('out_' + variable_input, 'wt') error_file = open('error_' + variable_input, 'wt') process = subprocess.Popen(['perl', 'script', 'options'], shell=False, stdout=out_file, stderr=error_file) list_process.append(process) #call several times script_runner for process in list_process: process.wait() 

so that your process runs in parallel

0
source share

All Articles