Sharing a large array with read-only continuous write between multiprocessing processes

I have a 60 kilobyte SciPy array (matrix) that I have to use between 5+ multiprocessing Process objects. I saw numpy-sharedmem and read this discussion on a SciPy list. It seems that there are two approaches - numpy-sharedmem and using multiprocessing.RawArray() and mapping NumPy dtype to ctype s. Now, numpy-sharedmem seems to be the way to go, but I have not seen a good reference example yet. I do not need any locks, since the array (actually the matrix) will be read-only. Now, due to its size, I would like to avoid copying. Seems like the right method is to create a single copy of the array as a sharedmem array and then pass it to the Process objects? A few specific questions:

  • What is the best way to pass a sharedmem sub- Process() es handle? Do I need a queue to go through a single array? Will the pipe be better? Can I just pass it as an argument to a subclass of Process() init (where I assume it is pickled)?

  • In the discussion I mentioned above, is numpy-sharedmem that is not 64-bit secure? I definitely use some structures that are not 32-bit addressable.

  • Is there a tradeoff with the RawArray() approach? Slow, buggier?

  • Do I need a ctype-to-dtype mapping for the numpy-sharedmem method?

  • Does anyone have an example of OpenSource code? I am very practiced and it is difficult for me to get this to work without any good example to look at.

If there is additional information that I can provide to help clarify this for others, please comment and I will add. Thank!

This needs to be run on Ubuntu Linux and possibly Mac OS, but portability is not a big concern.

+70
python numpy shared-memory multiprocessing
Jul 22 '13 at 10:31 on
source share
5 answers

@ Velimir Mlaker gave an excellent answer. I thought I could add a few comments and a tiny example.

(I could not find much documentation on sharedmem - these are the results of my own experiments.)

  • Do you need to pass descriptors when starting a subprocess or after it starts? If this is only the first, you can simply use the target and args arguments for the Process . This is potentially better than using a global variable.
  • On the discussion page that you linked, it looks like 64-bit Linux support was added to sharedmem some time ago, so this could be a problem without problems.
  • I do not know about that.
  • No. See the example below.

Example

 #!/usr/bin/env python from multiprocessing import Process import sharedmem import numpy def do_work(data, start): data[start] = 0; def split_work(num): n = 20 width = n/num shared = sharedmem.empty(n) shared[:] = numpy.random.rand(1, n)[0] print "values are %s" % shared processes = [Process(target=do_work, args=(shared, i*width)) for i in xrange(num)] for p in processes: p.start() for p in processes: p.join() print "values are %s" % shared print "type is %s" % type(shared[0]) if __name__ == '__main__': split_work(4) 

Exit

 values are [ 0.81397784 0.59667692 0.10761908 0.6736734 0.46349645 0.98340718 0.44056863 0.10701816 0.67167752 0.29158274 0.22242552 0.14273156 0.34912309 0.43812636 0.58484507 0.81697513 0.57758441 0.4284959 0.7292129 0.06063283] values are [ 0. 0.59667692 0.10761908 0.6736734 0.46349645 0. 0.44056863 0.10701816 0.67167752 0.29158274 0. 0.14273156 0.34912309 0.43812636 0.58484507 0. 0.57758441 0.4284959 0.7292129 0.06063283] type is <type 'numpy.float64'> 

This related question may be helpful.

+23
Jul 30 '13 at 15:54
source share

If you are on Linux (or any POSIX-compatible system), you can define this array as a global variable. multiprocessing uses fork() for Linux when starting a new child process. A newly created child process automatically shares memory with the parent until it changes it ( copy-on-write ).

Since you say: โ€œI donโ€™t need any locks, since the array (in fact, the matrix) will be read-onlyโ€, using this behavior would be a very simple and at the same time extremely efficient approach: all child processes will access to the same data in physical memory while reading this large numpy array.

Do not pass your array to the Process() constructor, it will instruct multiprocessing to pickle data for the child, which would be extremely inefficient or impossible in your case. On Linux, right after fork() child is an exact copy of the parent using the same physical memory, so all you have to do is make sure that the Python variable containing the โ€œmatrixโ€ is accessible from the target function that you pass to Process() . This can usually be achieved using a "global" variable.

Code example:

 from multiprocessing import Process from numpy import random global_array = random.random(10**4) def child(): print sum(global_array) def main(): processes = [Process(target=child) for _ in xrange(10)] for p in processes: p.start() for p in processes: p.join() if __name__ == "__main__": main() 

On Windows, which does not support fork() - multiprocessing , the win32 CreateProcess API call is used. It creates a completely new process from any executable file. Therefore, for Windows, you need to sort the data for the child if you need data that was created at runtime by the parent.

+29
Jul 22 '13 at 11:29
source share

You might be interested in a small piece of code that I wrote: github.com/vmlaker/benchmark-sharedmem

The only main.py file of main.py . This is the numpy-sharedmem standard - the code simply passes arrays (either numpy or sharedmem ) to the spawned processes through Pipe. Workers just call sum() on the data. I was interested in comparing data transfer times between two implementations.

I also wrote another, more complex code: github.com/vmlaker/sherlock .

Here I use the numpy-sharedmem module to process real-time images using OpenCV - the images are NumPy arrays, as are the OpenCV newer cv2 APIs. Images, in fact their links, are shared between processes through a dictionary object created from multiprocessing.Manager (as opposed to using Queue or Pipe.) I "Get big performance improvements over using simple NumPy arrays.

Pipe against the queue :

In my experience, IPC with Pipe is faster than Queue. And that makes sense as Queue adds a lock to make it safe for multiple producers / consumers. No pipe. But if you have only two processes saying "back and back", it is safe to use Pipe or, as the docs say:

... there is no risk of corruption from processes using different ends of the pipe at the same time.

sharedmem security :

The main problem with the sharedmem module is the possibility of a memory leak if the program fails. This is described in a long discussion here . Although Sturla mentions a memory leak fix on April 10, 2011, I still experience leaks using both repositories, Sturla Molden is owned by GitHub ( github.com/sturlamolden/sharedmem-numpy ) and Chris Lee-Messer on Bitbucket ( bitbucket. org / cleemesser / numpy-sharedmem ).

+19
Jul 28 '13 at 21:45
source share

If your array is so large you can use numpy.memmap . For example, if you have an array stored on disk, say 'test.array' , you can use simultaneous processes to access the data in it even in โ€œwriteโ€ mode, but your case is simpler because you only need read mode.

Creating an array:

 a = np.memmap('test.array', dtype='float32', mode='w+', shape=(100000,1000)) 

Then you can populate this array just like a regular array. For example:

 a[:10,:100]=1. a[10:,100:]=2. 

Data is saved to disk when variable a deleted.

Later, you can use several processes that will access the data in test.array :

 # read-only mode b = np.memmap('test.array', dtype='float32', mode='r', shape=(100000,1000)) # read and writing mode c = np.memmap('test.array', dtype='float32', mode='r+', shape=(100000,1000)) 

Related answers:

  • Working with big data in python and numpy, not enough RAM, how to save partial results to disk?

  • Is it possible to map uncountable data on a disk to an array using python?

+11
Jul 26. '13 at 11:26
source share

It may also be useful for you to take a look at the documentation for pyro , as if you could appropriately split your task, you can use it to execute different partitions on different machines, as well as on different kernels on the same machine.

+3
Jul 31 '13 at 5:39 on
source share



All Articles