Cython numpy accumulation function

Question

Cython numpy accumulation function

I need to implement a function to sum the elements of an array with a variable section length. Thus,

a = np.arange(10) section_lengths = np.array([3, 2, 4]) out = accumulate(a, section_lengths) print out array([ 3., 7., 35.])

I tried to implement the implementation in cython here:

https://gist.github.com/2784725

for performance I compare with a pure numpy solution for the case when the section lengths are all the same:

 LEN = 10000 b = np.ones(LEN, dtype=np.int) * 2000 a = np.arange(np.sum(b), dtype=np.double) out = np.zeros(LEN, dtype=np.double) %timeit np.sum(a.reshape(-1,2000), axis=1) 10 loops, best of 3: 25.1 ms per loop %timeit accumulate.accumulate(a, b, out) 10 loops, best of 3: 64.6 ms per loop

Do you have suggestions for improving productivity?

+7

performance python numpy cython

Andrea Zonca May 24 '12 at 23:04

source share

2 answers

In the loop update slot, out[i_bas] is slow, you can create a temporary variable to make it neat, and update out[i_bas] when the creation of the loop slot is completed. The following code will work as fast as the numpy version:

 import numpy as np cimport numpy as np ctypedef np.int_t DTYPE_int_t ctypedef np.double_t DTYPE_double_t cimport cython @cython.boundscheck(False) @cython.wraparound(False) def accumulate( np.ndarray[DTYPE_double_t, ndim=1] a not None, np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None, np.ndarray[DTYPE_double_t, ndim=1] out not None, ): cdef int i_el, i_bas, sec_length, lenout cdef double tmp lenout = out.shape[0] i_el = 0 for i_bas in range(lenout): tmp = 0 for sec_length in range(section_lengths[i_bas]): tmp += a[i_el] i_el+=1 out[i_bas] = tmp

+1

Hyry May 25 '12 at 2:16

source share

Joshdel · Accepted Answer · 2012-05-25T00:33:48+0000

You can try some of the following:

In addition to the compiler directive @cython.boundscheck(False) also try adding @cython.wraparound(False)
In setup.py script try adding some optimization flags:
ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]
Take a look at the .html file generated by cython -a accumulate.pyx to see if there are sections that do not have static typing or are heavily dependent on Python C-API calls:
http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types
Add a return at the end of the method. It currently does a bunch of unnecessary error checking in your hard loop in i_el += 1 .
Not sure if this will matter, but I try to make the loop counts cdef unsigned int , not just int

You can also compare your code with numpy if section_lengths are unequal, as this will probably require a little more than just sum .

Cython numpy accumulation function

More articles: