Python Multiprocessing Documentation Example

Question

Python Multiprocessing Documentation Example

I am trying to learn the multiprocessing of Python.

http://docs.python.org/2/library/multiprocessing.html from the example "To show individual process identifiers, here is an extended example:"

from multiprocessing import Process import os def info(title): print title print 'module name:', __name__ if hasattr(os, 'getppid'): # only available on Unix print 'parent process:', os.getppid() print 'process id:', os.getpid() def f(name): info('function f') print 'hello', name if __name__ == '__main__': info('main line') p = Process(target=f, args=('bob',)) p.start() p.join()

What am I looking at? I see that def f (name): is called after the information ("main line") is completed, but this synchronous call will be the default anyway. I see that the same process information ("main line") is the parent PID def f (name): but not sure what "multiprocessing" is.

Also, with join (), "Block the calling thread until the process whose join () method is complete." I don’t understand what the directed flow will be. In this example, what would join () block?

+8

python multithreading multiprocessing

dman Aug 11 '13 at 5:15

source share

1 answer

torek · Accepted Answer · 2013-08-11T08:09:04+0000

How multiprocessing works, in a nutshell:

Process() spawns ( fork or similar on Unix-like systems) is a copy of the source program (on Windows, which lacks a real fork , this is complicated and requires special care that there are notes to the module documentation).
The copy is linked to the original to find out that (a) it is a copy, and (b) it should exit and call the target= function (see below).
At this point, the original and copy are now different and independent and can be executed simultaneously.

Since they are independent processes, they now have independent global interpreter locks (in CPython), so both can use up to 100% CPU in a multipoint field if they are not compatible with other lower values, (OS). This is part of multiprocessing.

Of course, at some point you need to send data back and forth between these supposedly independent processes, for example, send the results from one (or many) workflows back to the "main" process. (There is a random exception, where each one is completely independent, but it is rare ... plus there is the whole startup sequence, starting with p.start() .) Thus, each instance of p created by Process , in the above example, has a communication channel with by its parent creator and vice versa (this is a symmetrical connection). The multiprocessing module uses the pickle module to convert data to strings - the same strings that you can store in files using pickle.dump , and sends data down the pickle.dump to workers to send arguments, etc., AND " up "from workers to send results.

In the end, as soon as you finish getting the results, the employee will finish (returning from the target= function) and telling his parents what he did. To make sure everything is closed and cleaned up, the parent must call p.join() to wait for the working user "I am done" message (actually OS-level exit in Unix-ish sysems).

The example is a bit stupid, since two printed messages do not have time at all, so starting them “simultaneously” has no measurable gain. But suppose that instead of typing hello , f should have computed the first 100,000 digits of π (3.14159 ...). Then you can create another Process , p2 with another target g , which calculates the first 100,000 digits of e (2.71828 ...). They will work independently. Then the parent could call p.join() and p2.join() to wait for both of them to finish (or create even more workers, work more and occupy more processors, or even leave and do their own work for a while).

Python Multiprocessing Documentation Example

More articles: