Numpy - Averaging multiple columns in a two-dimensional array

Question

Numpy - Averaging multiple columns in a two-dimensional array

I am doing this now by repeating, but there must be a way to accomplish this task using the numpy functions. My goal is to take a 2D array and middle columns of J at a time, creating a new array with the same number of rows as the original, but with column columns / J.

So, I want to accept this:

J = 2 // two columns averaged at a time [[1 2 3 4] [4 3 7 1] [6 2 3 4] [3 4 4 1]]

and do this:

 [[1.5 3.5] [3.5 4.0] [4.0 3.5] [3.5 2.5]]

Is there an easy way to accomplish this task? I also need a way so that if I have never finished an unaccounted residue column. So if, for example, I have an input array with 5 columns and J = 2, I would average the first two columns, and then the last three columns.

Any help you can provide would be great.

+4

python numpy

user1764386 Jan 18 '13 at 14:30

source share

2 answers

Theodros zelleke · Answer 1 · 2013-01-18T14:40:23+0000

 data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)

If your j shares data.shape[1] , that is.

Example:

 In [40]: data Out[40]: array([[7, 9, 7, 2], [7, 6, 1, 5], [8, 1, 0, 7], [8, 3, 3, 2]]) In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1) Out[41]: array([[ 8. , 4.5], [ 6.5, 3. ], [ 4.5, 3.5], [ 5.5, 2.5]])

acjay · Answer 2 · 2013-01-18T14:40:57+0000

First of all, it seems to me that you do not average the columns at all, you simply average two points of data at a time. It seems to me the best way to reformat the array, so you have an Nx2 data structure that you can directly feed to mean . You may need to put it first if the number of columns is not quite compatible. Then, at the end, just make the weighted average of the filled residual column and the one in front of it. Finally, return to the desired form.

To play the example provided by TheodrosZelleke:

 In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1) In [2]: data Out[2]: array([[7, 9, 7, 2, 5], [7, 6, 1, 5, 6], [8, 1, 0, 7, 7], [8, 3, 3, 2, 8]]) In [3]: cols = data.shape[1] In [4]: j = 2 In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1) In [6]: dataPadded Out[6]: array([[ 7., 9., 7., 2., 5., 0.], [ 7., 6., 1., 5., 6., 0.], [ 8., 1., 0., 7., 7., 0.], [ 8., 3., 3., 2., 8., 0.]]) In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1)) In [8]: dataAvg Out[8]: array([[ 8. , 4.5, 2.5], [ 6.5, 3. , 3. ], [ 4.5, 3.5, 3.5], [ 5.5, 2.5, 4. ]]) In [9]: if cols % j: dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j) dataAvg = dataAvg[:, :-1] ....: In [10]: dataAvg Out[10]: array([[ 8. , 3.83333333], [ 6.5 , 3. ], [ 4.5 , 3.5 ], [ 5.5 , 3. ]])

Numpy - Averaging multiple columns in a two-dimensional array

More articles: