Python multiprocessing takes much longer than single processing

I do some big calculations on three different two-dimensional arrays with two sizes in series. The arrays are huge, 25000x25000 each. Each calculation takes considerable time, so I decided to run 3 of them in parallel on 3 processor cores on the server. I follow the standard principles of multiprocessing and create 2 processes and a working function. Two calculations are performed through 2 processes, and the third is performed locally without a separate process. I pass huge arrays as arguments to processes, such as:

p1 = Process(target = Worker, args = (queue1, array1, ...)) # Some other params also going p2 = Process(target = Worker, args = (queue2, array2, ...)) # Some other params also going 

the Work function sends back two numpy vectors (1D array) to the list added to the queue, for example:

 queue.put([v1, v2]) 

I do not use multiprocessing.pool

but it’s amazing that I don’t get acceleration, it actually works 3 times slower. Are large arrays taking time? I can not understand what is happening. Should I use shared memory objects instead of passing arrays?

I would be grateful if anyone could help.

Thanks.

+8
python arrays numpy process multiprocessing
source share
2 answers

my problem seems to be resolved. I used the django module, from which I called multiprocessing.pool.map_async. My work function was a function inside the class itself. That was the problem. Multiprocessing cannot call a function of the same class inside another process, since subprocesses do not use memory. Therefore, there is no live instance of the class inside the subprocess. This is probably why it is not called. As far as I understand. I removed the function from the class and placed it in the same file, but outside the class, just before the start of the class definition. It worked. I also have moderate acceleration. And one more thing - people who are faced with the same problem, please do not read large arrays and do not go between processes. Pickling and Uncickling will take a long time, and you will not speed up, but speed up. Try reading arrays inside the subprocess itself.

And if possible, use numpy.memmap arrays, they are pretty fast.

+1
source share

Here is an example using np.memmap and Pool . See that you can determine the number of processes and workers. In this case, you have no control over the queue, which can be achieved using multiprocessing.Queue :

 from multiprocessing import Pool import numpy as np def mysum(array_file_name, col1, col2, shape): a = np.memmap(array_file_name, shape=shape, mode='r+') a[:, col1:col2] = np.random.random((shape[0], col2-col1)) ans = a[:, col1:col2].sum() del a return ans if __name__ == '__main__': nop = 1000 # number_of_processes now = 3 # number of workers p = Pool(now) array_file_name = 'test.array' shape = (250000, 250000) a = np.memmap(array_file_name, shape=shape, mode='w+') del a cols = [[shape[1]*i/nop, shape[1]*(i+1)/nop] for i in range(nop)] results = [] for c1, c2 in cols: r = p.apply_async(mysum, args=(array_file_name, c1, c2, shape)) results.append(r) p.close() p.join() final_result = sum([r.get() for r in results]) print final_result 

You can achieve better results by using parallel processing using shared memory whenever possible. See this related question:

  • Python multiprocessing shared memory objects
+1
source share

All Articles