Joblib.Parallel for understanding nested lists

I have a nested list that looks something like this:

>>> nested = [[1, 2], [3, 4, 5]]
>>> [[sqrt(i) for i in j] for j in nested]
[[1.0, 1.4142135623730951], [1.7320508075688772, 2.0, 2.23606797749979]]

Is it possible to perform parellelize using the standard joblib approach for awkwardly parallel for loops ? If so, what is the correct syntax for delayed?

As far as I can tell, the docs don't mention or give no example of nested inputs. I tried several naive implementations, to no avail:

>>> #this syntax fails:
>>> Parallel(n_jobs = 2) (delayed(sqrt)(i for i in j) for j in nested)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 660, in __call__
    self.retrieve()
  File "C:\Python27\lib\site-packages\joblib\parallel.py", line 512, in retrieve
    self._output.append(job.get())
  File "C:\Python27\lib\multiprocessing\pool.py", line 558, in get
    raise self._value
pickle.PicklingError: Can't pickle <type 'generator'>: it not found as __builtin__.generator
>>> #this syntax doesn't fail, but gives the wrong output:
>>> Parallel(n_jobs = 2) (delayed(sqrt)(i) for i in j for j in nested)
[1.7320508075688772, 1.7320508075688772, 2.0, 2.0, 2.23606797749979, 2.23606797749979]

If this is not possible, I can obviously rebuild the list before and after passing it to Parallel. However, my actual list is long and each item is huge, so doing this is not ideal.

+4
source share
2 answers

, , : sqrt, (i for i in j) generator, . , . j , , , . multiprocessing .

, , , , multiprocessing, , .

:

1:

... , j_n, , . , , , . , , , - , .

2:

, , , "" .

, (.. n m :

numpy , :

import numpy as np
# convert to array -- only works well if you have a regular structure!
nested_arr = np.array(nested)
# the shape of the array, for later
shape = nested_arr.shape

# generate an (n*m) linear array from an (n, m) 2D one
linear = nested_arr.ravel()

# run the parallel calculation
results_lin = Parallel(n_jobs = 2) (delayed(sqrt)(e) for e in linear)

# get everything back into shape:
results = results_lin.reshape(shape)

, , np.nditer() . , joblib multiprocessing. ( - , ), np.sqrt(nested_arr) - , , !

, :

# store lengths of the sub-lists
structure = [len(e) for e in nested]

# make one linear list
linlist = []
for l in nested:
    linlist.extend(l)

# finally run the parallel computation:
results_lin = Parallel(n_jobs = 2) (delayed(sqrt)(e) for e in linlist)

# ...and bring it all back into shape:
results = []
i = 0

for n in structure:
    results.append(results_lin[i:i+n])

, , . , , , .

?

, , , np.array. , , :

In [14]: time resl = [sqrt(e) for e in range(1000000)]
CPU times: user 2.1 s, sys: 194 ms, total: 2.29 s
Wall time: 2.19 s

In [15]: time res = np.sqrt(np.arange(1000000))
CPU times: user 10.4 ms, sys: 0 ns, total: 10.4 ms
Wall time: 10.1 ms

, , 24 . ( 216 , numpy, , mutliprocessing .

+1

, (, (i for i in j)) - . , , , , :

def sqrt_n(j):
   return [i**i for i in j]

Parallel(n_jobs = 2) (delayed(sqrt_n)(j) for j in nested)
0

All Articles