There is a large module called multiprocessing , which is part of the python standard library. This will lead to processes across multiple cores, as you like to use other processors. There is an example of using the Pool object in documents, below is an abridged version of this example. It will calculate a square of 10 numbers that distribute the workload for workflows and display the result.
Simple work pool
from multiprocessing import Pool def f(x): return x*x pool = Pool(processes=4) print pool.map(f, range(10))
Output
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
I had more problems to break your problem into the same structure. I had to create some intermediary functions to achieve this. I don't have numpy, so I just used lists and dictionaries instead of what you put. You can replace them and try the code.
More complex scenario
from multiprocessing import Pool import time, pprint def fun(av, bv): time.sleep(0.1) return (av, bv) def data_stream(a, b): for i, av in enumerate(a): for j, bv in enumerate(b): yield (i, j), (av, bv) def proxy(args): return args[0], fun(*args[1]) a = range(100, 400, 100) b = range(100, 400, 100) Y = {} pool = Pool(processes=4) results = pool.map(proxy, data_stream(a, b)) for k,v in results: Y[k] = v pprint.pprint(Y)
Output
{(0, 0): (100, 100), (0, 1): (100, 200), (0, 2): (100, 300), (1, 0): (200, 100), (1, 1): (200, 200), (1, 2): (200, 300), (2, 0): (300, 100), (2, 1): (300, 200), (2, 2): (300, 300)}
Performance
In the example, I just put a dummy 0.1 second delay to simulate hard work. But even in this example, if you start the pool with processes=1 , it works at 0.950 with processes=4 , it works at 0.352. You can use the multiprocessor library in many ways. Pooling is just one way. You can study examples and experiment.
One comment below mentioned the use of the chunksize argument for pool.map to improve performance. It is important to have a general idea of what is happening under the hood in order to really get an idea of performance. Basically, all the data that you transfer to other processes should be pickled transferred to another used process without use, and then the result goes through the same process back to the main process. There is overhead for this interprocess communication. Keep this in mind when you experiment.