View Numpy View Without Copy (2d Moving / Sliding Window, Strides, Masked Memory Structures)

Question

View Numpy View Without Copy (2d Moving / Sliding Window, Strides, Masked Memory Structures)

I have an image stored as a 2d numpy array (possibly multi-d).

I can look at this array, which reflects a 2d sliding window, but when I reformat it so that each row is a smoothed window (rows are windows, a column is a pixel in this window), python makes a full copy. He does this because I am using a typical trick and the new form is not contiguous in memory.

I need this because I transfer whole large images to the sklearn classifier, which accepts 2d matrices where there is no batch / partial fit procedure and the full extended copy is too large for memory.

My question is: is there a way to do this without a full copy of the view?

I believe the answer will be either (1) something about the steps or memory management in memory that I forgot, or (2) some masked memory structure for python that can emulate a numpy array even in an external package like sklearn which includes cython.

This task of preparing to move the windows of the 2nd image in memory is a common, but the only attempt I know for accounting for patches is the Vigra project ( http://ukoethe.imtqy.com/vigra/ ).

Thanks for the help.

>>> A=np.arange(9).reshape(3,3) >>> print A [[0 1 2] [3 4 5] [6 7 8]] >>> xstep=1;ystep=1; xsize=2; ysize=2 >>> window_view = np.lib.stride_tricks.as_strided(A, ((A.shape[0] - xsize + 1) / xstep, (A.shape[1] - ysize + 1) / ystep, xsize, ysize), ... (A.strides[0] * xstep, A.strides[1] * ystep, A.strides[0], A.strides[1])) >>> print window_view [[[[0 1] [3 4]] [[1 2] [4 5]]] [[[3 4] [6 7]] [[4 5] [7 8]]]] >>> >>> np.may_share_memory(A,window_view) True >>> B=window_view.reshape(-1,xsize*ysize) >>> np.may_share_memory(A,B) False

+7

python numpy scikit-learn image scikit-image

locallyoptimal Jul 18 '14 at 2:16

source share

1 answer

jasaarim · Answer 1 · 2015-05-24T14:13:41+0000

Your task is impossible using only steps, but NumPy supports one kind of array that does the job. With steps and masked_array you can create the desired view for your data. However, not all NumPy Functions support masked_array operations, so scikit-learn may not handle them very well.

Let me first take a look at what we are trying to do here. Consider the input to your example. In essence, the data is just a 1-dimensional array in memory, and it is easier if we think about the steps with this. The array only looks like 2nd because we defined its shape. Using steps, you can define a form, for example:

 from numpy.lib.stride_tricks import as_strided base = np.arange(9) isize = base.itemsize A = as_strided(base, shape=(3, 3), strides=(3 * isize, isize))

Now the goal is to set such steps to base so that it orders numbers, as at the end of the array, B In other words, we ask for integers a and B such that

 >>> as_strided(base, shape=(4, 4), strides=(a, b)) array([[0, 1, 3, 4], [1, 2, 4, 5], [3, 4, 6, 7], [4, 5, 7, 8]])

But this is clearly impossible. The closest look we can achieve is how it is with the window upside down over the base :

 >>> C = as_strided(base, shape=(5, 5), strides=(isize, isize)) >>> C array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]])

But the difference is that we have additional columns and rows that we would like to get rid of. Thus, effectively we ask which is not adjacent, and also makes jumps at regular intervals. In this example, we want to have every third element excluded from the window and jumps through one element after two lines.

We can describe this as masked_array :

 >>> mask = np.zeros((5, 5), dtype=bool) >>> mask[2, :] = True >>> mask[:, 2] = True >>> D = np.ma.masked_array(C, mask=mask)

This array contains exactly the data that we want, and this is just a look at the source data. We can confirm that the data is equal

 >>> D.data[~D.mask].reshape(4, 4) array([[0, 1, 3, 4], [1, 2, 4, 5], [3, 4, 6, 7], [4, 5, 7, 8]])

But, as I said at the beginning, it is likely that scikit-learn does not understand masked arrays. If it just converts this to an array, the data will be incorrect:

 >>> np.array(D) array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]])

View Numpy View Without Copy (2d Moving / Sliding Window, Strides, Masked Memory Structures)

More articles: