When numpy will copy an array using reshape () function

The numpy.reshape document says:

This will be a new view object, if possible; otherwise it will be a copy. Please note: there is no guarantee that the memory layout (C- or Fortran-contiguous) of the returned array.

My question is when will numpy decide to return a new view and when to copy the whole array? Are there any general principles that tell people about reshape behavior, or is it just unpredictable? Thanks.

+5
source share
2 answers

The link found by @mgillson seems to address the question of “how can I tell if she made a copy”, but not “how I predict it” or understand why she made a copy. As for the test, I like to use A.__array_interfrace__ .

Most likely, this will be a problem if you try to assign values ​​to the transformed array, while also waiting for the original to change. And it would be hard for me to find a case of SO in which this was a problem.

The copied reformatting will be a little slower than the unprepared, but again I can’t come up with a case where this slowed down the whole code. Copying can also be a problem if you are working with arrays so large that a simple operation causes a memory error.


After changing the values ​​in the data buffer, you must be in continuous order, either "C" or "F". For instance:

 In [403]: np.arange(12).reshape(3,4,order='C') Out[403]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [404]: np.arange(12).reshape(3,4,order='F') Out[404]: array([[ 0, 3, 6, 9], [ 1, 4, 7, 10], [ 2, 5, 8, 11]]) 

It will make a copy if the initial order is so "messed up" that it cannot return such values. Rearranging after transposition can do this (see My example below). So games with stride_tricks.as_strided . By hand, these are the only cases I can think of.

 In [405]: x=np.arange(12).reshape(3,4,order='C') In [406]: y=xT In [407]: x.__array_interface__ Out[407]: {'version': 3, 'descr': [('', '<i4')], 'strides': None, 'typestr': '<i4', 'shape': (3, 4), 'data': (175066576, False)} In [408]: y.__array_interface__ Out[408]: {'version': 3, 'descr': [('', '<i4')], 'strides': (4, 16), 'typestr': '<i4', 'shape': (4, 3), 'data': (175066576, False)} 

y , transpose has the same data pointer. The transposition was performed without changing or copying the data, it just created a new object with new shape , strides and flags .

 In [409]: y.flags Out[409]: C_CONTIGUOUS : False F_CONTIGUOUS : True ... In [410]: x.flags Out[410]: C_CONTIGUOUS : True F_CONTIGUOUS : False ... 

y is the order of 'F'. Now try changing it.

 In [411]: y.shape Out[411]: (4, 3) In [412]: z=y.reshape(3,4) In [413]: z.__array_interface__ Out[413]: {... 'shape': (3, 4), 'data': (176079064, False)} In [414]: z Out[414]: array([[ 0, 4, 8, 1], [ 5, 9, 2, 6], [10, 3, 7, 11]]) 

z is a copy, its data buffer pointer is different. Its values ​​are not arranged in the way it looks like x or y , no 0,1,2,...

But just reformatting x does not create a copy:

 In [416]: w=x.reshape(4,3) In [417]: w Out[417]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [418]: w.__array_interface__ Out[418]: {... 'shape': (4, 3), 'data': (175066576, False)} 

Raveling y same as y.reshape(-1) ; he produces as a copy:

 In [425]: y.reshape(-1) Out[425]: array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11]) In [426]: y.ravel().__array_interface__['data'] Out[426]: (175352024, False) 

Assigning values ​​to a matched array like this may be the most likely case where a copy will result in an error. For example, x.ravel()[::2]=99 changes each other value of x and y (columns and rows, respectively). But y.ravel()[::2]=0 does nothing because of this copy.

Thus, the transformation after transposition is the most likely copy scenario. I would be happy to explore other possibilities.

edit: y.reshape(-1,order='F')[::2]=0 changes the values ​​of y . In a compatible manner, changing the form does not create a copy.


One answer in @mgillson's link, fooobar.com/questions/120137 / ... , indicates that the syntax A.shape=... prevents copying. If he cannot change the form without copying, this will cause an error:

 In [441]: y.shape=(3,4) ... AttributeError: incompatible shape for a non-contiguous array 

This is also mentioned in the reshape documentation

If you want the error to be raised if the data is copied, you must assign a new shape to the shape attribute of the array:


SO change question after as_strided :

changing the shape of an n-dimensional array without changing the shape

and

View images without copies (2d Moving / Sliding window, Strides, Masked memory structures)

============================

Here is my first cut when translating shape.c/_attempt_nocopy_reshape in Python. It can be run with something like:

 newstrides = attempt_reshape(numpy.zeros((3,4)), (4,3), False) 

 import numpy # there an np variable in the code def attempt_reshape(self, newdims, is_f_order): newnd = len(newdims) newstrides = numpy.zeros(newnd+1).tolist() # +1 is a fudge self = numpy.squeeze(self) olddims = self.shape oldnd = self.ndim oldstrides = self.strides #/* oi to oj and ni to nj give the axis ranges currently worked with */ oi,oj = 0,1 ni,nj = 0,1 while (ni < newnd) and (oi < oldnd): print(oi, ni) np = newdims[ni]; op = olddims[oi]; while (np != op): if (np < op): # /* Misses trailing 1s, these are handled later */ np *= newdims[nj]; nj += 1 else: op *= olddims[oj]; oj += 1 print(ni,oi,np,op,nj,oj) #/* Check whether the original axes can be combined */ for ok in range(oi, oj-1): if (is_f_order) : if (oldstrides[ok+1] != olddims[ok]*oldstrides[ok]): # /* not contiguous enough */ return 0; else: #/* C order */ if (oldstrides[ok] != olddims[ok+1]*oldstrides[ok+1]) : #/* not contiguous enough */ return 0; # /* Calculate new strides for all axes currently worked with */ if (is_f_order) : newstrides[ni] = oldstrides[oi]; for nk in range(ni+1,nj): newstrides[nk] = newstrides[nk - 1]*newdims[nk - 1]; else: #/* C order */ newstrides[nj - 1] = oldstrides[oj - 1]; #for (nk = nj - 1; nk > ni; nk--) { for nk in range(nj-1, ni, -1): newstrides[nk - 1] = newstrides[nk]*newdims[nk]; nj += 1; ni = nj oj += 1; oi = oj print(olddims, newdims) print(oldstrides, newstrides) # * Set strides corresponding to trailing 1s of the new shape. if (ni >= 1) : print(newstrides, ni) last_stride = newstrides[ni - 1]; else : last_stride = self.itemsize # PyArray_ITEMSIZE(self); if (is_f_order) : last_stride *= newdims[ni - 1]; for nk in range(ni, newnd): newstrides[nk] = last_stride; return newstrides 
+2
source

@hoaulj gave a good answer, but there is an error in its implementation of the _attempt_nocopy_reshape function. If the reader notices, in the 4th line of his code

 newstrides = numpy.zeros(newnd+1).tolist() # +1 is a fudge 

there is a coefficient of fiction. This hack only works in certain situations (and the function breaks on certain inputs). Hacking is necessary, because when incrementing and setting ni, nj, oi, oj , an error occurs when the outermost while loop completes. The update should read

 ni = nj;nj += 1; oi = oj;oj += 1; 

I think the error arose because in the source code ( on the official numpy github ), it is implemented

  ni = nj++; oi = oj++; 

using post-increment, and @hoaulj translated it as if using a preliminary increment, i.e. ++nj .

For completeness, I am attaching a revised code below. Hope it clears up any possible confusion.

 import numpy # there an np variable in the code def attempt_reshape(self, newdims, is_f_order): newnd = len(newdims) newstrides = numpy.zeros(newnd).tolist() # +1 is a fudge self = numpy.squeeze(self) olddims = self.shape oldnd = self.ndim oldstrides = self.strides #/* oi to oj and ni to nj give the axis ranges currently worked with */ oi,oj = 0,1 ni,nj = 0,1 while (ni < newnd) and (oi < oldnd): np = newdims[ni]; op = olddims[oi]; while (np != op): print(ni,oi,np,op,nj,oj) if (np < op): # /* Misses trailing 1s, these are handled later */ np *= newdims[nj]; nj += 1 else: op *= olddims[oj]; oj += 1 #/* Check whether the original axes can be combined */ for ok in range(oi, oj-1): if (is_f_order) : if (oldstrides[ok+1] != olddims[ok]*oldstrides[ok]): # /* not contiguous enough */ return 0; else: #/* C order */ if (oldstrides[ok] != olddims[ok+1]*oldstrides[ok+1]) : #/* not contiguous enough */ return 0; # /* Calculate new strides for all axes currently worked with */ if (is_f_order) : newstrides[ni] = oldstrides[oi]; for nk in range(ni+1,nj): newstrides[nk] = newstrides[nk - 1]*newdims[nk - 1]; else: #/* C order */ newstrides[nj - 1] = oldstrides[oj - 1]; #for (nk = nj - 1; nk > ni; nk--) { for nk in range(nj-1, ni, -1): newstrides[nk - 1] = newstrides[nk]*newdims[nk]; ni = nj;nj += 1; oi = oj;oj += 1; # * Set strides corresponding to trailing 1s of the new shape. if (ni >= 1) : last_stride = newstrides[ni - 1]; else : last_stride = self.itemsize # PyArray_ITEMSIZE(self); if (is_f_order) : last_stride *= newdims[ni - 1]; for nk in range(ni, newnd): newstrides[nk] = last_stride; return newstrides newstrides = attempt_reshape(numpy.zeros((5,3,2)), (10,3), False) print(newstrides) 
+1
source

All Articles