Cython numpy accumulation function

I need to implement a function to sum the elements of an array with a variable section length. Thus,

a = np.arange(10) section_lengths = np.array([3, 2, 4]) out = accumulate(a, section_lengths) print out array([ 3., 7., 35.]) 

I tried to implement the implementation in cython here:

https://gist.github.com/2784725

for performance I compare with a pure numpy solution for the case when the section lengths are all the same:

 LEN = 10000 b = np.ones(LEN, dtype=np.int) * 2000 a = np.arange(np.sum(b), dtype=np.double) out = np.zeros(LEN, dtype=np.double) %timeit np.sum(a.reshape(-1,2000), axis=1) 10 loops, best of 3: 25.1 ms per loop %timeit accumulate.accumulate(a, b, out) 10 loops, best of 3: 64.6 ms per loop 

Do you have suggestions for improving productivity?

+7
source share
2 answers

You can try some of the following:

  • In addition to the compiler directive @cython.boundscheck(False) also try adding @cython.wraparound(False)

  • In setup.py script try adding some optimization flags:

    ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]

  • Take a look at the .html file generated by cython -a accumulate.pyx to see if there are sections that do not have static typing or are heavily dependent on Python C-API calls:

    http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types

  • Add a return at the end of the method. It currently does a bunch of unnecessary error checking in your hard loop in i_el += 1 .

  • Not sure if this will matter, but I try to make the loop counts cdef unsigned int , not just int

You can also compare your code with numpy if section_lengths are unequal, as this will probably require a little more than just sum .

+2
source

In the loop update slot, out[i_bas] is slow, you can create a temporary variable to make it neat, and update out[i_bas] when the creation of the loop slot is completed. The following code will work as fast as the numpy version:

 import numpy as np cimport numpy as np ctypedef np.int_t DTYPE_int_t ctypedef np.double_t DTYPE_double_t cimport cython @cython.boundscheck(False) @cython.wraparound(False) def accumulate( np.ndarray[DTYPE_double_t, ndim=1] a not None, np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None, np.ndarray[DTYPE_double_t, ndim=1] out not None, ): cdef int i_el, i_bas, sec_length, lenout cdef double tmp lenout = out.shape[0] i_el = 0 for i_bas in range(lenout): tmp = 0 for sec_length in range(section_lengths[i_bas]): tmp += a[i_el] i_el+=1 out[i_bas] = tmp 
+1
source

All Articles