Since I think the memory representation is convenient and fast, I try to avoid creating NumPy arrays in cython and work with representations of these arrays. However, sometimes this cannot be avoided so as not to modify the existing array, but to create a new one. In the upper functions this is not noticeable, but in the often called subprograms it is. Consider the following function
#@cython.profile(False) @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) cdef double [:] vec_eq(double [:] v1, int [:] v2, int cond): ''' Function output corresponds to v1[v2 == cond]''' cdef unsigned int n = v1.shape[0] cdef unsigned int n_ = 0 # Size of array to create cdef size_t i for i in range(n): if v2[i] == cond: n_ += 1 # Create array for selection cdef double [:] s = np.empty(n_, dtype=np_float) # Slow line # Copy selection to new array n_ = 0 for i in range(n): if v2[i] == cond: s[n_] = v1[i] n_ += 1 return s
Profiling tells me there is some kind of speed here to gain

I could adapt the function, because sometimes, for example, the average value of this vector is calculated, sometimes the sum. Thus, I could rewrite this to summarize or get the average. But is there no way to create a representation of memory with minimal overhead directly by dynamically determining the size . Something like this: first create a buffer c using malloc , etc., And at the end of the function, convert the buffer to a view by passing a pointer and stepping forward or so ..
Edit 1: Maybe for simple cases, adapting function e. gram. how is an acceptable approach. I just added the argument and summing / taking the average. This way, I don't need to create an array, and I can easily handle the malloc function. It won't be faster, right?
# ... cdef double vec_eq(double [:] v1, int [:] v2, int cond, opt=0):
source share