I am trying to use Cython to parallelize an expensive operation that involves creating intermediate multidimensional arrays.
The following very simplified code illustrates what I'm trying to do:
import numpy as np cimport cython cimport numpy as np from cython.parallel cimport prange from libc.stdlib cimport malloc, free @cython.boundscheck(False) @cython.wraparound(False) def embarrasingly_parallel_example(char[:, :] A): cdef unsigned int m = A.shape[0] cdef unsigned int n = A.shape[1] cdef np.ndarray[np.float64_t, ndim = 2] out = np.empty((m, m), np.float64) cdef unsigned int ii, jj cdef double[:, :] tmp for ii in prange(m, nogil=True): for jj in range(m):
There seem to be at least two reasons why this fails to compile:
Based on the output of cython -a , a typed memory view is created here:
cdef double[:, :] tmp = <double[:n, :n] > tmp_carray
seems to include Python API calls, and therefore I cannot release the GIL to allow the outer loop to work in parallel.
I got the impression that typed representations of memory are not Python objects, so the child process should be able to create them without first getting the GIL. This business?
2. Even if I replaced prange(m, nogil=True) with normal range(m) , Cython still does not look like there is a cdef inside the inner loop:
Error compiling Cython file: ------------------------------------------------------------ ...
Update
It turns out that the second problem was easily solved by moving
cdef double[:, :] tmp
outside the for loop and just assigning
tmp = <double[:n, :n] > tmp_carray
inside the loop. I still do not quite understand why this is necessary.
Now, if I try to use prange , I find the following compilation error:
Error compiling Cython file: ------------------------------------------------------------ ...
python numpy parallel-processing cython thread-local-storage
ali_m
source share