What is the recommended way to allocate memory for typed memory?

The Cython documentation on typed views of memory lists three ways to assign a typed view to memory:

  1. from raw pointer C,
  2. from np.ndarray and
  3. from cython.view.array .

Suppose I don’t have any data passed to my cython function from outside, but instead I want to allocate memory and return it as np.ndarray , which of these options did I choose? Also suppose that the size of this buffer is not a compile-time constant, i.e. I cannot allocate it on the stack, but for option 1, malloc required.

Thus, 3 options will look something like this:

 from libc.stdlib cimport malloc, free cimport numpy as np from cython cimport view np.import_array() def memview_malloc(int N): cdef int * m = <int *>malloc(N * sizeof(int)) cdef int[::1] b = <int[:N]>m free(<void *>m) def memview_ndarray(int N): cdef int[::1] b = np.empty(N, dtype=np.int32) def memview_cyarray(int N): cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i") 

What surprises me is that in all three cases, Cython generates quite a lot of code to allocate memory, in particular, calling __Pyx_PyObject_to_MemoryviewSlice_dc_int . This suggests (and I could be wrong, my understanding of Cython’s inner workings is very limited) that it first creates a Python object and then “embeds” it into the memory view, which seems unnecessary.

A simple benchmark does not show much difference between the three methods, and 2. the fastest with a small margin.

Which of the three methods is recommended? Or is there another, better option?

Additional question: I want to finally return the result in the form of np.ndarray , after working with this memory representation in a function. Is a typed representation of memory the best choice, or would I just use the old buffer interface, like ndarray below, to create ndarray in the first place?

 cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32) 
+63
python memory-management memory cython buffer
Aug 27 '13 at 10:19
source share
2 answers

Look here for an answer.

The basic idea is that you want cpython.array.array and cpython.array.clone ( not cython.array.* ):

 from cpython.array cimport array, clone # This type is what you want and can be cast to things of # the "double[:]" syntax, so no problems there cdef array[double] armv, templatemv templatemv = array('d') # This is fast armv = clone(templatemv, L, False) 

EDIT

It turned out that the steps in this thread were garbage. Here is my set, with my timings:

 # cython: language_level=3 # cython: boundscheck=False # cython: wraparound=False import time import sys from cpython.array cimport array, clone from cython.view cimport array as cvarray from libc.stdlib cimport malloc, free import numpy as numpy cimport numpy as numpy cdef int loops def timefunc(name): def timedecorator(f): cdef int L, i print("Running", name) for L in [1, 10, 100, 1000, 10000, 100000, 1000000]: start = time.clock() f(L) end = time.clock() print(format((end-start) / loops * 1e6, "2f"), end=" ") sys.stdout.flush() print("μs") return timedecorator print() print("INITIALISATIONS") loops = 100000 @timefunc("cpython.array buffer") def _(int L): cdef int i cdef array[double] arr, template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("cpython.array memoryview") def _(int L): cdef int i cdef double[::1] arr cdef array template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("cpython.array raw C type") def _(int L): cdef int i cdef array arr, template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("numpy.empty_like memoryview") def _(int L): cdef int i cdef double[::1] arr template = numpy.empty((L,), dtype='double') for i in range(loops): arr = numpy.empty_like(template) # Prevents dead code elimination str(arr[0]) @timefunc("malloc") def _(int L): cdef int i cdef double* arrptr for i in range(loops): arrptr = <double*> malloc(sizeof(double) * L) free(arrptr) # Prevents dead code elimination str(arrptr[0]) @timefunc("malloc memoryview") def _(int L): cdef int i cdef double* arrptr cdef double[::1] arr for i in range(loops): arrptr = <double*> malloc(sizeof(double) * L) arr = <double[:L]>arrptr free(arrptr) # Prevents dead code elimination str(arr[0]) @timefunc("cvarray memoryview") def _(int L): cdef int i cdef double[::1] arr for i in range(loops): arr = cvarray((L,),sizeof(double),'d') # Prevents dead code elimination str(arr[0]) print() print("ITERATING") loops = 1000 @timefunc("cpython.array buffer") def _(int L): cdef int i cdef array[double] arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("cpython.array memoryview") def _(int L): cdef int i cdef double[::1] arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("cpython.array raw C type") def _(int L): cdef int i cdef array arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("numpy.empty_like memoryview") def _(int L): cdef int i cdef double[::1] arr = numpy.empty((L,), dtype='double') cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("malloc") def _(int L): cdef int i cdef double* arrptr = <double*> malloc(sizeof(double) * L) cdef double d for i in range(loops): for i in range(L): d = arrptr[i] free(arrptr) # Prevents dead-code elimination str(d) @timefunc("malloc memoryview") def _(int L): cdef int i cdef double* arrptr = <double*> malloc(sizeof(double) * L) cdef double[::1] arr = <double[:L]>arrptr cdef double d for i in range(loops): for i in range(L): d = arr[i] free(arrptr) # Prevents dead-code elimination str(d) @timefunc("cvarray memoryview") def _(int L): cdef int i cdef double[::1] arr = cvarray((L,),sizeof(double),'d') cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) 

Output:

 INITIALISATIONS Running cpython.array buffer 0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs Running cpython.array memoryview 0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs Running cpython.array raw C type 0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs Running numpy.empty_like memoryview 1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs Running malloc 0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs Running malloc memoryview 1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs Running cvarray memoryview 1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs ITERATING Running cpython.array buffer 0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs Running cpython.array memoryview 0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs Running cpython.array raw C type 0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs Running numpy.empty_like memoryview 0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs Running malloc 0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs Running malloc memoryview 0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs Running cvarray memoryview 0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs 

(The reason for the “iterations” standard is that some methods have surprisingly different characteristics in this regard.)

In order of initialization speed:

malloc : It's a harsh world, but it's fast. If you need to highlight a lot of things and have unhindered iteration and indexing performance, this should be the case. But, as a rule, you make a good bet for ...

cpython.array raw C type : Well, hell, it's fast. And it is safe. Unfortunately, through Python, it gets access to its data fields. You can avoid this by using a wonderful trick:

 arr.data.as_doubles[i] 

which brings it to standard speed while eliminating security! This makes this a great replacement for malloc , being basically a pretty-oriented version!

cpython.array buffer : When you cpython.array buffer just three to four times the malloc installation time, it looks great. Unfortunately, it has significant overhead (albeit small compared to the boundscheck and wraparound ). This means that it really competes with the full security options, but it is the fastest of them to initialize. Your choice.

cpython.array memoryview : now this is an order of magnitude slower than malloc to initialize. This is a shame, but it repeats just as quickly. This is the standard solution I would suggest if boundscheck or wraparound not on (in this case cpython.array buffer may be a more compelling compromise).

Rest. The only thing worth something is numpy , due to the many fun methods associated with objects. What is it nonetheless.

+65
Jan 10 '14 at
source share

In response to Veedrac's answer: be aware that support for memoryview cpython.array with python 2.7 seems to be causing memory leaks now. This seems to be a long-standing issue, as it is mentioned on the cython-users mailing list here in a November 2012 post. Running the Veedrac test with Cython version 0.22 with Python 2.7.6 and Python 2.7.9 causes a large memory leak when initializing cpython.array using the buffer or memoryview . When running a script with Python 3.4, there is no memory leak. I sent a bug report to the Cython Developer Mailing List.

+9
Apr 17 '15 at 1:13
source share