Is there a good way to transfer a large chunk of data between two python subprocesses without using a disk? Here is a cartoony example of what I hope to accomplish:
import sys, subprocess, numpy cmdString = """ import sys, numpy done = False while not done: cmd = raw_input() if cmd == 'done': done = True elif cmd == 'data': ##Fake data. In real life, get data from hardware. data = numpy.zeros(1000000, dtype=numpy.uint8) data.dump('data.pkl') sys.stdout.write('data.pkl' + '\\n') sys.stdout.flush()""" proc = subprocess.Popen(
This creates a subprocess that generates a numpy array and saves the array to disk. Then the parent process loads the array from disk. He works!
The problem is that our equipment can generate data 10 times faster than a disk can read / write. Is there a way to transfer data from one python process to another exclusively in memory, perhaps even without making a copy of the data? Can I do something like pass by reference?
My first attempt to transfer data exclusively in memory is pretty lousy:
import sys, subprocess, numpy cmdString = """ import sys, numpy done = False while not done: cmd = raw_input() if cmd == 'done': done = True elif cmd == 'data': ##Fake data. In real life, get data from hardware. data = numpy.zeros(1000000, dtype=numpy.uint8) ##Note that this is NFG if there a '10' in the array: sys.stdout.write(data.tostring() + '\\n') sys.stdout.flush()""" proc = subprocess.Popen(
It is very slow (much slower than saving to disk) and very, very fragile. There must be a better way!
I am not married to the subprocess module if the data processing process does not block the parent application. I briefly tried "multiprocessing", but have not had time yet.
Background: we have a piece of equipment that generates up to ~ 2 GB / s of data in a series of ctypes buffers. The python code for handling these buffers has its own hands, completely related to the flow of information. I want to coordinate this flow of information with several other hardware running simultaneously in the "main" program, without subprocesses blocking each other. My current approach is to slightly collapse the data in the subprocess before saving to disk, but it would be nice to pass the full monty to the "master" process.
python pass-by-reference numpy subprocess ctypes
Andrew Feb 17 '11 at 19:47 2011-02-17 19:47
source share