Extract intermediate multidimensional arrays in Cython without acquiring a GIL

I am trying to use Cython to parallelize an expensive operation that involves creating intermediate multidimensional arrays.

The following very simplified code illustrates what I'm trying to do:

import numpy as np cimport cython cimport numpy as np from cython.parallel cimport prange from libc.stdlib cimport malloc, free @cython.boundscheck(False) @cython.wraparound(False) def embarrasingly_parallel_example(char[:, :] A): cdef unsigned int m = A.shape[0] cdef unsigned int n = A.shape[1] cdef np.ndarray[np.float64_t, ndim = 2] out = np.empty((m, m), np.float64) cdef unsigned int ii, jj cdef double[:, :] tmp for ii in prange(m, nogil=True): for jj in range(m): # allocate a temporary array to hold the result of # expensive_function_1 tmp_carray = <double * > malloc((n ** 2) * sizeof(double)) # a 2D typed memoryview onto tmp_carray tmp = <double[:n, :n] > tmp_carray # shove the intermediate result in tmp expensive_function_1(A[ii, :], A[jj, :], tmp) # get the final (scalar) output for this ii, jj out[ii, jj] = expensive_function_2(tmp) # free the intermediate array free(tmp_carray) return out # some silly examples - the actual operation I'm performing is a lot more # involved # ------------------------------------------------------------------------ @cython.boundscheck(False) @cython.wraparound(False) cdef void expensive_function_1(char[:] x, char[:] y, double[:, :] tmp): cdef unsigned int m = tmp.shape[0] cdef unsigned int n = x.shape[0] cdef unsigned int ii, jj for ii in range(m): for jj in range(m): tmp[ii, jj] = 0 for kk in range(n): tmp[ii, jj] += (x[kk] + y[kk]) * (ii - jj) @cython.boundscheck(False) @cython.wraparound(False) cdef double expensive_function_2(double[:, :] tmp): cdef unsigned int m = tmp.shape[0] cdef unsigned int ii, jj cdef double result = 0 for ii in range(m): for jj in range(m): result += tmp[ii, jj] return result 

There seem to be at least two reasons why this fails to compile:

  • Based on the output of cython -a , a typed memory view is created here:

     cdef double[:, :] tmp = <double[:n, :n] > tmp_carray 

    seems to include Python API calls, and therefore I cannot release the GIL to allow the outer loop to work in parallel.

    I got the impression that typed representations of memory are not Python objects, so the child process should be able to create them without first getting the GIL. This business?

2. Even if I replaced prange(m, nogil=True) with normal range(m) , Cython still does not look like there is a cdef inside the inner loop:

  Error compiling Cython file: ------------------------------------------------------------ ... # allocate a temporary array to hold the result of # expensive_function_1 tmp_carray = <double*> malloc((n ** 2) * sizeof(double)) # a 2D typed memoryview onto tmp_carray cdef double[:, :] tmp = <double[:n, :n]> tmp_carray ^ ------------------------------------------------------------ parallel_allocate.pyx:26:17: cdef statement not allowed here 

Update

It turns out that the second problem was easily solved by moving

  cdef double[:, :] tmp 

outside the for loop and just assigning

  tmp = <double[:n, :n] > tmp_carray 

inside the loop. I still do not quite understand why this is necessary.

Now, if I try to use prange , I find the following compilation error:

 Error compiling Cython file: ------------------------------------------------------------ ... # allocate a temporary array to hold the result of # expensive_function_1 tmp_carray = <double*> malloc((n ** 2) * sizeof(double)) # a 2D typed memoryview onto tmp_carray tmp = <double[:n, :n]> tmp_carray ^ ------------------------------------------------------------ parallel_allocate.pyx:28:16: Memoryview slices can only be shared in parallel sections 
+8
python numpy parallel-processing cython thread-local-storage
source share
2 answers

Disclaimer: here everything should be taken with salt. I guess more that knowing. . You should definitely ask the Cython-User question. They are always friendly and quick to respond.

I agree that the Cython documentation is not very clear:

[...] GIL is often not required to view memory:

cpdef int sum3d (int [:,:,:] arr) nogil: ...

In particular, you do not need a GIL for indexing, slicing, or transposing memory. Memoryviews require a GIL for copy methods (continuous copies of C and Fortran), or when dtype is an object and an element of an object is read or written.

I think this means that passing a parameter to a memory view or using it for slicing or transposing does not need the Python GIL. However, creating memory or copying requires GIL.

Another argument supporting this is that the Cython function can return Python as memory.

 from cython.view cimport array as cvarray import numpy as np def bla(): narr = np.arange(27, dtype=np.dtype("i")).reshape((3, 3, 3)) cdef int [:, :, :] narr_view = narr return narr_view 

gives:

 >>> import hello >>> hello.bla() <MemoryView of 'ndarray' at 0x1b03380> 

which means that the memory is viewed in the managed memory of the Python GC protocol and therefore needs to create a GIL. So you cannot create a memory view in the nogil section


Now, regarding the error message

Memory grids can only be used in parallel partitions

I think you should read it as "You cannot have private memoryview sections. It must be slices with shared memory of the stream."

+5
source share

http://docs.cython.org/src/userguide/external_C_code.html#releasing-the-gil

""

GIL Release

You can issue a GIL around a section of code using the nogil statement:

  with nogil: 
 <code to be executed with the GIL released> Code in the body of the statement must not manipulate Python objects in any way, and must 

not to name anything that manipulates Python objects without first re-acquiring the GIL. Cython does not currently verify this.

""

0
source share

All Articles