Does the subprocess module use python GIL release?

Question

Does the subprocess module use python GIL release?

When invoking a linux binary that takes a relatively long time through the Python subprocess module, does it release the GIL?

I want to parallelize some code that calls a binary program from the command line. Is it better to use threads (via threading and a multiprocessing.pool.ThreadPool ) or multiprocessing ? My assumption is that if subprocess releases a GIL, it is best to choose the threading option.

+7

python multithreading subprocess python-multithreading gil

Simon walker Apr 29 '14 at 15:33

source share

3 answers

pilcrow · Answer 1 · 2015-05-11T22:10:43+0000

When invoking a linux binary that takes a relatively long time through the Python subprocess module, does it release the GIL?

Yes, it releases the Global Interpreter Lock (GIL) in the calling process.

As you know, on POSIX platforms, subprocess offers convenient interfaces on top of the raw components from fork , execve and waitpid .

By checking the sources of CPython 2.7.9, fork and execve do not release the GIL. However, these calls are not blocked, so we do not expect the GIL to be released.

waitpid , of course, blocks, but we see that the implementation rejects the GIL using the ALLOW_THREADS macros:

 static PyObject * posix_waitpid(PyObject *self, PyObject *args) { .... Py_BEGIN_ALLOW_THREADS pid = waitpid(pid, &status, options); Py_END_ALLOW_THREADS ....

This can also be tested by invoking some long program like sleep from a demo multi-threaded python script.

jfs · Answer 2 · 2014-04-29T16:05:32+0000

GIL does not cover several processes. subprocess.Popen starts a new process. If he starts the Python process, then he will have his own GIL.

You don't need multiple threads (or processes created using multiprocessing ) if all you need to do is run multiple Linux executables in parallel:

 from subprocess import Popen # start all processes processes = [Popen(['program', str(i)]) for i in range(10)] # now all processes run in parallel # wait for processes to complete for p in processes: p.wait()

You can use multiprocessing.ThreadPool to limit the number of simultaneously running programs .

s16h · Answer 3 · 2014-04-29T16:08:13+0000

Since subprocess designed to run an executable file (it is essentially a wrapper around os.fork() and os.execve() ), it probably makes sense to use it. You can use subprocess.Popen . Something like:

  import subprocess process = subprocess.Popen(["binary"])

This will work as a separate process, so the GIL will not be affected. Then you can use the Popen.poll() method to check if the child process has completed:

 if process.poll(): # process has finished its work returncode = process.returncode

You just need to make sure that you don't call any methods that wait for the process to complete (e.g. Popen.communicate () ) to avoid blocking the Python script.

As mentioned in this answer

multiprocessing designed to run functions within an existing (Python) with support for more flexible communication between the process family. The multiprocessing module is designed to provide interfaces and functions that are very similar to threads, allowing CPython to scale your processing across multiple processors / cores despite the GIL.

So, given your use case, subprocess seems to be the right choice.

Does the subprocess module use python GIL release?

More articles: