Pandas: How to sum columns based on conditional values โ€‹โ€‹of other columns?

I have the following pandas DataFrame.

import pandas as pd df = pd.read_csv('filename.csv') print(df) dog ABC 0 dog1 0.787575 0.159330 0.053095 1 dog10 0.770698 0.169487 0.059815 2 dog11 0.792689 0.152043 0.055268 3 dog12 0.785066 0.160361 0.054573 4 dog13 0.795455 0.150464 0.054081 5 dog14 0.794873 0.150700 0.054426 .. .... 8 dog19 0.811585 0.140207 0.048208 9 dog2 0.797202 0.152033 0.050765 10 dog20 0.801607 0.145137 0.053256 11 dog21 0.792689 0.152043 0.055268 .... 

I create a new column by summing the columns "A" , "B" , "C" as follows:

 df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1) 

Now I would like to do this on a conditional basis, i.e. if "A" < 0.78 , then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1) . Otherwise, the value must be zero.

How to create conditional expressions like this?

My thought would be to use

 df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78)) 

However, this does not work, and I cannot specify the axis.

How to create a column based on the values โ€‹โ€‹of other columns?

You can also do something like each df['dog'] == 'dog2' , create a dog2_sum column, i.e.

  df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2')) 

but my approach is wrong.

`

+7
python pandas conditional dataframe
source share
1 answer

The following should work, here we mask df where the condition is satisfied, this will set NaN to the lines where the condition is not satisfied, so we call fillna in a new col:

 In [67]: df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC')) df Out[67]: ABC 0 0.197334 0.707852 -0.443475 1 -1.063765 -0.914877 1.585882 2 0.899477 1.064308 1.426789 3 -0.556486 -0.150080 -0.149494 4 -0.035858 0.777523 -0.453747 In [73]: df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1) df['total'].fillna(0, inplace=True) df Out[73]: ABC total 0 0.197334 0.707852 -0.443475 0.905186 1 -1.063765 -0.914877 1.585882 0.000000 2 0.899477 1.064308 1.426789 1.963785 3 -0.556486 -0.150080 -0.149494 0.000000 4 -0.035858 0.777523 -0.453747 0.000000 

Another approach is to call where result of sum , this requires a parameter of value when the condition is not met:

 In [75]: df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0) df Out[75]: ABC total 0 0.197334 0.707852 -0.443475 0.905186 1 -1.063765 -0.914877 1.585882 0.000000 2 0.899477 1.064308 1.426789 1.963785 3 -0.556486 -0.150080 -0.149494 0.000000 4 -0.035858 0.777523 -0.453747 0.000000 
+5
source share

All Articles