N-dimensional sliding window with Pandas or Numpy

How to make the equivalent of R (xts) rollapply (...., by.column = FALSE) using Numpy or Pandas? When a data frame is given, pandas roll_apply only seems to work column by column, and does not provide the ability to provide the full objective function (window size) x (data frame size) of the objective function.

import pandas as pd import numpy as np xx = pd.DataFrame(np.zeros([10, 10])) pd.rolling_apply(xx, 5, lambda x: np.shape(x)[0]) 0 1 2 3 4 5 6 7 8 9 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 7 5 5 5 5 5 5 5 5 5 5 8 5 5 5 5 5 5 5 5 5 5 9 5 5 5 5 5 5 5 5 5 5 

So what happens, roll_apply goes down each column in turn and applies a sliding five-row window to each of them, whereas I want the sliding windows to be a 5x10 array every time, in which case, I would get the result of a single column (not 2d array).

+1
python arrays numpy pandas r
source share
1 answer

I really can't find a way to calculate the "wide" calendar application in pandas docs, so I would use numpy to get a "windowed" view of the array and apply ufunc to it. Here is an example:

 In [40]: arr = np.arange(50).reshape(10, 5); arr Out[40]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34], [35, 36, 37, 38, 39], [40, 41, 42, 43, 44], [45, 46, 47, 48, 49]]) In [41]: win_size = 5 In [42]: isize = arr.itemsize; isize Out[42]: 8 

arr.itemsize is 8, since dtype is np.int64 by default, you need this for the following window view idiom:

 In [43]: windowed = np.lib.stride_tricks.as_strided(arr, shape=(arr.shape[0] - win_size + 1, win_size, arr.shape[1]), strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)); windowed Out[43]: array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], [[ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34]], [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34], [35, 36, 37, 38, 39]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34], [35, 36, 37, 38, 39], [40, 41, 42, 43, 44]], [[25, 26, 27, 28, 29], [30, 31, 32, 33, 34], [35, 36, 37, 38, 39], [40, 41, 42, 43, 44], [45, 46, 47, 48, 49]]]) 

Strides are the number of bytes between two adjacent elements along a given axis, so strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize) means to skip 5 elements when moving from window [0] to window [1] and skip 5 elements when switching from window [0, 0] to window [0, 1]. Now you can call any ufunc on the resulting array, for example:

 In [44]: windowed.sum(axis=(1,2)) Out[44]: array([300, 425, 550, 675, 800, 925]) 
+6
source share

All Articles