I am trying to parallelize some of the calculations that are numpyusing the Python module multiprocessing. Consider this simplified example:
numpy
multiprocessing
import time import numpy from multiprocessing import Pool def test_func(i): a = numpy.random.normal(size=1000000) b = numpy.random.normal(size=1000000) for i in range(2000): a = a + b b = a - b a = a - b return 1 t1 = time.time() test_func(0) single_time = time.time() - t1 print("Single time:", single_time) n_par = 4 pool = Pool() t1 = time.time() results_async = [ pool.apply_async(test_func, [i]) for i in range(n_par)] results = [r.get() for r in results_async] multicore_time = time.time() - t1 print("Multicore time:", multicore_time) print("Efficiency:", single_time / multicore_time)
When I executed it, it is multicore_timeapproximately equal single_time * n_par, while I expect it to be close to single_time. Indeed, if I replace the calculations of numpyeverything with time.sleep(10), this is what I get - perfect efficiency. But for some reason it does not work with numpy. Could this be allowed or is this an internal constraint numpy?
multicore_time
single_time * n_par
single_time
time.sleep(10)
Additional information that may be helpful:
OSX 10.9.5, Python 3.4.2, - Core i7 ( ) 4 ( 50% , ).
, n_par top, 100% CPU
n_par
top
numpy , ( 75% n_par = 4).
n_par = 4
, , , . , , , , . , a = a + b 3 , a, b , a. 8 (16 * 8 ). , i7s - 3MB - 8MB L3, 3 . , , , . , , .
a = a + b
a
b
, , numpy , , , - - cython numba.
, , , , - add(a,b,a) , a = a + b -. for over numpy , . numpy.ctypeslib numpy (. fooobar.com/questions/82981/...).
add(a,b,a)
numpy.ctypeslib
: . , - .
Intel PCM ( Intel® Performance Counter Monitor), , - ( Linux ksysguard). 2 (2 ).
:
def somethinglong(b): n=200000 m=5000 shared=np.arange(n) for i in np.arange(m): 0.01*shared pool = mp.Pool(2) jobs = [() for i in range(8)] for i in range(5): timei = time.time() pool.map(somethinglong, jobs , chunksize=1) #for job in jobs: #somethinglong(job) print(time.time()-timei)
, -:
, ( ), : 15/8. 2
, ( , ). , -. : 15/15. 2
: (aux = 0.01 * shared) - ( ).