Pandas: How to sum columns based on conditional values of other columns?

Question

Pandas: How to sum columns based on conditional values of other columns?

I have the following pandas DataFrame.

import pandas as pd df = pd.read_csv('filename.csv') print(df) dog ABC 0 dog1 0.787575 0.159330 0.053095 1 dog10 0.770698 0.169487 0.059815 2 dog11 0.792689 0.152043 0.055268 3 dog12 0.785066 0.160361 0.054573 4 dog13 0.795455 0.150464 0.054081 5 dog14 0.794873 0.150700 0.054426 .. .... 8 dog19 0.811585 0.140207 0.048208 9 dog2 0.797202 0.152033 0.050765 10 dog20 0.801607 0.145137 0.053256 11 dog21 0.792689 0.152043 0.055268 ....

I create a new column by summing the columns "A" , "B" , "C" as follows:

 df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

Now I would like to do this on a conditional basis, i.e. if "A" < 0.78 , then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1) . Otherwise, the value must be zero.

How to create conditional expressions like this?

My thought would be to use

 df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

However, this does not work, and I cannot specify the axis.

How to create a column based on the values of other columns?

You can also do something like each df['dog'] == 'dog2' , create a dog2_sum column, i.e.

  df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

but my approach is wrong.

`

+7

python pandas conditional dataframe

ShanZhengYang Jun 21 '16 at 14:45

source share

1 answer

Edchum · Accepted Answer · 2016-06-21T14:47:36+0000

The following should work, here we mask df where the condition is satisfied, this will set NaN to the lines where the condition is not satisfied, so we call fillna in a new col:

 In [67]: df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC')) df Out[67]: ABC 0 0.197334 0.707852 -0.443475 1 -1.063765 -0.914877 1.585882 2 0.899477 1.064308 1.426789 3 -0.556486 -0.150080 -0.149494 4 -0.035858 0.777523 -0.453747 In [73]: df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1) df['total'].fillna(0, inplace=True) df Out[73]: ABC total 0 0.197334 0.707852 -0.443475 0.905186 1 -1.063765 -0.914877 1.585882 0.000000 2 0.899477 1.064308 1.426789 1.963785 3 -0.556486 -0.150080 -0.149494 0.000000 4 -0.035858 0.777523 -0.453747 0.000000

Another approach is to call where result of sum , this requires a parameter of value when the condition is not met:

 In [75]: df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0) df Out[75]: ABC total 0 0.197334 0.707852 -0.443475 0.905186 1 -1.063765 -0.914877 1.585882 0.000000 2 0.899477 1.064308 1.426789 1.963785 3 -0.556486 -0.150080 -0.149494 0.000000 4 -0.035858 0.777523 -0.453747 0.000000

Pandas: How to sum columns based on conditional values ​​of other columns?

More articles:

Pandas: How to sum columns based on conditional values of other columns?