Given a byte buffer, dtype, shape and strides, how to create a Numpy ndarray

I have a buffer, dtype, shape and strides. I want to create a Numpy ndarray that reuses buffer memory.

There is numpy.frombuffer that creates a 1D array from the buffer and reuses memory. However, I'm not sure if I can easily and safely change it and set the steps.

There is a numpy.ndarray constructor that can reference a buffer, but I'm not sure if it will reuse memory, or if it will copy it (this is not clear from the documentation).

So, will the numpy.ndarray constructor do what I want? Or what can I use instead?


So now I'm trying to figure out what the numpy.ndarray constructor numpy.ndarray . The code is here . It uses PyArray_BufferConverter to convert the buffer argument. Then it will call PyArray_NewFromDescr_int , which can be seen here . If the data is transferred there, it will be fa->flags &= ~NPY_ARRAY_OWNDATA; .

+6
source share
3 answers

As mentioned in a comment from @hpaulj, you can accomplish this using the stride_tricks module. You need both np.frombuffer and np.lib.stride_tricks.as_strided :

Collect data from a NumPy array

 In [1]: import numpy as np In [2]: x = np.random.random((3, 4)).astype(dtype='f4') In [3]: buffer = x.data In [4]: dtype = x.dtype In [5]: shape = x.shape In [6]: strides = x.strides 

Recover NumPy Array

 In [7]: xx = np.frombuffer(buffer, dtype) In [8]: xx = np.lib.stride_tricks.as_strided(xx, shape, strides) 

Check Results

 In [9]: x Out[9]: array([[ 0.75343359, 0.20676662, 0.83675659, 0.99904215], [ 0.37182721, 0.83846378, 0.6888299 , 0.57195812], [ 0.39905572, 0.7258808 , 0.88316005, 0.2187883 ]], dtype=float32) In [10]: xx Out[10]: array([[ 0.75343359, 0.20676662, 0.83675659, 0.99904215], [ 0.37182721, 0.83846378, 0.6888299 , 0.57195812], [ 0.39905572, 0.7258808 , 0.88316005, 0.2187883 ]], dtype=float32) In [11]: x.strides Out[11]: (16, 4) In [12]: xx.strides Out[12]: (16, 4) 
+1
source

I would stick with frombuffer because it was intended directly for this purpose and made it clear what you are doing. Here is an example:

 In [58]: s0 = 'aaaa' # a single int32 In [59]: s1 = 'aaabaaacaaadaaae' # 4 int32s, each increasing by 1 In [60]: a0 = np.frombuffer(s0, dtype='>i4', count=1) # dtype sets the stride In [61]: print a0 [1633771873] In [62]: a1 = np.frombuffer(s, dtype='>i4', count=4) In [63]: print a1 [1633771874 1633771875 1633771876 1633771877] In [64]: a2 = a1.reshape((2,2)) # do a reshape, which also sets the strides In [65]: print a2 [[1633771874 1633771875] [1633771876 1633771877]] In [66]: a2 - a0 # do some calculation with the reshape Out[66]: array([[1, 2], [3, 4]], dtype=int32) 

Is there anything you need to prevent this from happening?

+2
source

You can use any method - none of them will generate a copy:

 s = 'aaabaaacaaadaaae' a1 = np.frombuffer(s, np.int32, 4).reshape(2, 2) a2 = np.ndarray((2, 2), np.int32, buffer=s) print(a1.flags.owndata, a1.base) # (False, 'aaabaaacaaadaaae') print(a2.flags.owndata, a2.base) # (False, 'aaabaaacaaadaaae') 
+2
source

All Articles