Add a column to the end of the Pandas DataFrame containing the average of previous data

I have a DataFrame ave_data that contains the following:

 ave_data Time F7 F8 F9 00:00:00 43.005593 -56.509746 25.271271 01:00:00 55.114918 -59.173852 31.849262 02:00:00 63.990762 -64.699492 52.426017 

I want to add another column to this data framework containing the average value for columns F7, F8 and F9 for each row.

ave_data DataFrame can resize, because my code is read from different Excel files later, so the method should be general (i.e. add a column containing the average value always, like the last column in the DataFrame, and not in column number 4)

 desired output Time F7 F8 F9 Average 00:00:00 43.005593 -56.509746 25.271271 4.25 01:00:00 55.114918 -59.173852 31.849262 9.26 02:00:00 63.990762 -64.699492 52.426017 17.24 
+9
source share
4 answers

You can take a copy of your df with copy() , and then just call mean and pass the parameters axis=1 and numeric_only=True so that the average value is calculated per line and ignored by non-numeric columns when you do the following: the column is always added at the end:

 In [68]: summary_ave_data = df.copy() summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1) summary_ave_data Out[68]: Time F7 F8 F9 average 0 2015-07-29 00:00:00 43.005593 -56.509746 25.271271 3.922373 1 2015-07-29 01:00:00 55.114918 -59.173852 31.849262 9.263443 2 2015-07-29 02:00:00 63.990762 -64.699492 52.426017 17.239096 
+11
source

@LaangeHaare or someone else who is curious, I just checked it and a copy of the accepted answer seems unnecessary (maybe I'm missing something ...)

so you can simplify this with:

 df['average'] = df.mean(numeric_only=True, axis=1) 

I would just add this as a comment, but I have no reputation

+4
source

In general, if you want to use specific columns, you can use:

 df['average'] = df[['F7','F8']].mean(axis=1) 

where axis = 1 denotes a row action (using the column values ​​for each row to calculate the average in the "average" column)

Then you can sort by this column:

 df.sort_values(by='average',ascending=False, inplace=True) 

where inplace = True means applying the action to the data frame instead of computing for the copy.

+1
source

df.assign specifically for this purpose. It returns a copy to avoid changing the original data frame and / or raising SettingWithCopyWarning . It works as follows:

 data_with_ave = ave_data.assign(average = ave_data.mean(axis=1, numeric_only=True)) 

This function can also create multiple columns at the same time:

 data_with_ave = ave_data.assign( average = ave_data.mean(axis=1, numeric_only=True), median = ave_data.median(axis=1, numeric_only=True) ) 

Starting with panda 0.36, you can even refer to the newly created column to create another:

 data_with_ave = ave_data.assign( average = ave_data.mean(axis=1, numeric_only=True), isLarge = lambda df: df['average'] > 10 ) 
0
source

All Articles