Accessing NumPy Record Array Columns in Cython

I am a relatively experienced Python programmer, but have not written a single C for a very long time, and I'm trying to understand Cython. I am trying to write a Cython function that will work in a NumPy repeat column.

The code I have is below.

recarray_func.pyx:

import numpy as np cimport numpy as np cdef packed struct rec_cell0: np.float32_t f0 np.int64_t i0, i1, i2 def sum(np.ndarray[rec_cell0, ndim=1] recarray): cdef Py_ssize_t i cdef rec_cell0 *cell cdef np.float32_t running_sum = 0 for i in range(recarray.shape[0]): cell = &recarray[i] running_sum += cell.f0 return running_sum 

At the invitation of the translator:

 array = np.recarray((100, ), names=['f0', 'i0', 'i1', 'i2'], formats=['f4', 'i8', 'i8', 'i8']) recarray_func.sum(array) 

This just summarizes the repeat column f0 . It compiles and runs without problems.

My question is: how do I change this so that it can work with any column? In the above example, the sum of the column is hard-coded and accessible via dot notation. Is it possible to change the function so that the column in the amount is passed as a parameter?

+6
source share
2 answers

I believe this should be possible with Cython memoryviews . Something in this direction should work (code not verified):

 import numpy as np cimport numpy as np cdef packed struct rec_cell0: np.float32_t f0 np.int64_t i0, i1, i2 def sum(rec_cell0[:] recview): cdef Py_ssize_t i cdef np.float32_t running_sum = 0 for i in range(recview.shape[0]): running_sum += recview[i].f0 return running_sum 

The speed can probably be increased because the array of records you pass to Cython is contiguous. On the python (call) side, you can use np.require , while the function signature should change to rec_cell0[::1] recview to indicate that the array can be considered contiguous. And as always, once the code has been tested, disable the boundscheck , wraparound and nonecheck in Cython, most likely a further speed improvement.

+2
source

What you want requires weak typing, which C does not have. If all your record types are the same, you could remove something like: (disclaimer: I don't have Cython on this machine, so I'm coding for the blind).

 import numpy as np cimport numpy as np cdef packed struct rec_cell0: np.float32_t f0 np.int64_t i0, i1, i2 def sum(np.ndarray[rec_cell0, ndim=1] recarray, colname): cdef Py_ssize_t i cdef rec_cell0 *cell cdef np.float32_t running_sum = 0 loc = recarray.dtype.fields[colname][1] for i in range(recarray.shape[0]): cell = &recarray[i] running_sum += *(int *)(&cell+loc); return running_sum 
+1
source

All Articles