If you are on Linux (or any POSIX-compatible system), you can define this array as a global variable. multiprocessing uses fork() for Linux when starting a new child process. A newly created child process automatically shares memory with the parent until it changes it ( copy-on-write ).
Since you say: โI donโt need any locks, since the array (in fact, the matrix) will be read-onlyโ, using this behavior would be a very simple and at the same time extremely efficient approach: all child processes will access to the same data in physical memory while reading this large numpy array.
Do not pass your array to the Process() constructor, it will instruct multiprocessing to pickle data for the child, which would be extremely inefficient or impossible in your case. On Linux, right after fork() child is an exact copy of the parent using the same physical memory, so all you have to do is make sure that the Python variable containing the โmatrixโ is accessible from the target function that you pass to Process() . This can usually be achieved using a "global" variable.
Code example:
from multiprocessing import Process from numpy import random global_array = random.random(10**4) def child(): print sum(global_array) def main(): processes = [Process(target=child) for _ in xrange(10)] for p in processes: p.start() for p in processes: p.join() if __name__ == "__main__": main()
On Windows, which does not support fork() - multiprocessing , the win32 CreateProcess API call is used. It creates a completely new process from any executable file. Therefore, for Windows, you need to sort the data for the child if you need data that was created at runtime by the parent.
Jan-Philip Gehrcke Jul 22 '13 at 11:29 2013-07-22 11:29
source share