Cython Memoryview as return value

Consider this dummy Keaton code:

#!python #cython: boundscheck=False #cython: wraparound=False #cython: initializedcheck=False #cython: cdivision=True #cython: nonecheck=False import numpy as np # iterator function cdef double[:] f(double[:] data): data[0] *= 1.01 data[1] *= 1.02 return data # looping function cdef double[:] _call_me(int bignumber, double[:] data): cdef int ii for ii in range(bignumber): data = f(data) return data # helper function to allow calls from Python def call_me(bignumber): cdef double[:] data = np.ones(2) return _call_me(bignumber, data) 

Now, if I do cython -a , it shows the return statements in yellow. I do something similar in a program that is very performance-critical, and according to profiling, it really slows down my code. So why does cython need python for these return statements? An annotated file gives a hint:

 PyErr_SetString(PyExc_TypeError,"Memoryview return value is not initialized"); 

Surprisingly, the google search for cython "Memoryview return value not initialized" yields null results.

+8
python numpy cython memoryview
source share
1 answer

The slow part is not what you think. The slower part (good ... first)

 data = f(data) 

Not f(data) . data = .

This assigns a struct , which is defined as

 typedef struct { struct __pyx_memoryview_obj *memview; char *data; Py_ssize_t shape[8]; Py_ssize_t strides[8]; Py_ssize_t suboffsets[8]; } __Pyx_memviewslice; 

and indicated assignment

 __pyx_t_3 = __pyx_f_3cyt_f(__pyx_v_data); 

where __pyx_t_3 is of this type. If this is done in a loop, as is, it takes much longer to copy structures than to perform a trivial body of a function. I spent time in pure C and it gives similar numbers.

(Editing notes: assignment is actually primarily a problem because it also makes it impossible to optimize the structure and other copies.)

However , all this seems silly. The only reason to copy the structure is if something has changed, but nothing has happened. Memory points in the same place, data points in the same place and the shape, steps and offsets are the same.

The only way to avoid copying a struct is to not modify any of what it refers to (otherwise always return a memoryview ). This is only possible in cases where the return is in any case pointless, as here. Or you can hack C, I think, like me. Just don’t cry if you break something.


Also note that you can make your own nogil function, so it cannot have anything to do with returning to Python.


EDIT

Optimizing compiler

C cast me a little. Basically, I deleted some assignments and deleted many other things. Basically the slow way is this:

 #include<stdio.h> struct __pyx_memoryview_obj; typedef struct { struct __pyx_memoryview_obj *memview; char *data; ssize_t shape[8]; ssize_t strides[8]; ssize_t suboffsets[8]; } __Pyx_memviewslice; static __Pyx_memviewslice __pyx_f_3cyt_f(__Pyx_memviewslice __pyx_v_data) { __Pyx_memviewslice __pyx_r = { 0, 0, { 0 }, { 0 }, { 0 } }; __pyx_r = __pyx_v_data; return __pyx_r; } main() { int i; __Pyx_memviewslice __pyx_v_data = {0, 0, { 0 }, { 0 }, { 0 }}; for (i=0; i<10000000; i++) { __pyx_v_data = __pyx_f_3cyt_f(__pyx_v_data); } } 

(compilation without optimizations). I am not a C programmer, so I apologize if what I did to some extent does not fit into the fact that I copied the computer-generated code.

I know this doesn't help, but I did my best, okay?

+3
source share

All Articles