Numpy - Averaging multiple columns in a two-dimensional array

I am doing this now by repeating, but there must be a way to accomplish this task using the numpy functions. My goal is to take a 2D array and middle columns of J at a time, creating a new array with the same number of rows as the original, but with column columns / J.

So, I want to accept this:

J = 2 // two columns averaged at a time [[1 2 3 4] [4 3 7 1] [6 2 3 4] [3 4 4 1]] 

and do this:

 [[1.5 3.5] [3.5 4.0] [4.0 3.5] [3.5 2.5]] 

Is there an easy way to accomplish this task? I also need a way so that if I have never finished an unaccounted residue column. So if, for example, I have an input array with 5 columns and J = 2, I would average the first two columns, and then the last three columns.

Any help you can provide would be great.

+4
source share
2 answers
 data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1) 

If your j shares data.shape[1] , that is.

Example:

 In [40]: data Out[40]: array([[7, 9, 7, 2], [7, 6, 1, 5], [8, 1, 0, 7], [8, 3, 3, 2]]) In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1) Out[41]: array([[ 8. , 4.5], [ 6.5, 3. ], [ 4.5, 3.5], [ 5.5, 2.5]]) 
+4
source

First of all, it seems to me that you do not average the columns at all, you simply average two points of data at a time. It seems to me the best way to reformat the array, so you have an Nx2 data structure that you can directly feed to mean . You may need to put it first if the number of columns is not quite compatible. Then, at the end, just make the weighted average of the filled residual column and the one in front of it. Finally, return to the desired form.

To play the example provided by TheodrosZelleke:

 In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1) In [2]: data Out[2]: array([[7, 9, 7, 2, 5], [7, 6, 1, 5, 6], [8, 1, 0, 7, 7], [8, 3, 3, 2, 8]]) In [3]: cols = data.shape[1] In [4]: j = 2 In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1) In [6]: dataPadded Out[6]: array([[ 7., 9., 7., 2., 5., 0.], [ 7., 6., 1., 5., 6., 0.], [ 8., 1., 0., 7., 7., 0.], [ 8., 3., 3., 2., 8., 0.]]) In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1)) In [8]: dataAvg Out[8]: array([[ 8. , 4.5, 2.5], [ 6.5, 3. , 3. ], [ 4.5, 3.5, 3.5], [ 5.5, 2.5, 4. ]]) In [9]: if cols % j: dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j) dataAvg = dataAvg[:, :-1] ....: In [10]: dataAvg Out[10]: array([[ 8. , 3.83333333], [ 6.5 , 3. ], [ 4.5 , 3.5 ], [ 5.5 , 3. ]]) 
+1
source

All Articles