Python pandas dataframe create new column from other column cells

I have a dataframe like this ...

a_return b_return bc_ratio instrument_holding 0 NaN NaN -0.165286 a 1 0.996474 1.013166 -0.164637 a 2 0.997730 0.993540 -0.170058 a 3 1.024294 1.024318 -0.184530 a 4 1.019071 1.047297 -0.148644 a 5 0.992243 1.008210 -0.188752 a 6 1.010331 1.039020 -0.098413 a 7 0.989542 0.991899 0.025051 b 8 1.005197 1.002527 -0.025051 b 9 0.990755 1.002352 -0.099800 a 10 1.006241 0.998375 -0.078643 b 

I want to add a column named "log_ret" where the value from "a_return" or "b_return" is used based on the value in the column "instrument_holding". Like this...

  a_return b_return bc_ratio instrument_holding log_ret 0 NaN NaN -0.165286 a NaN 1 0.996474 1.013166 -0.164637 a 0.996474 2 0.997730 0.993540 -0.170058 a 0.997730 3 1.024294 1.024318 -0.184530 a 1.024294 4 1.019071 1.047297 -0.148644 a 1.019071 5 0.992243 1.008210 -0.188752 a 0.992243 6 1.010331 1.039020 -0.098413 a 1.010331 7 0.989542 0.991899 0.025051 b 0.991899 8 1.005197 1.002527 -0.025051 b 1.002527 9 0.990755 1.002352 -0.099800 a 0.990755 10 1.006241 0.998375 -0.078643 b 0.998375 

As you can see, if the string value for "instrument_holding" is "a", "log_ret" has a value of "a_return", and if "instrument_holding" has a value of "b", "log_ret" has a value of 'b_return.

I thought it could be done like this ...

 df["log_ret"] = df[df["instrument_holding"] + "_return"] 

This is not true. Thanks for any suggestions!

+7
python pandas dataframe
source share
3 answers
  • use map to change values ​​in instrument_holding
  • use lookup

 df.assign( log_return=df.lookup(df.index, df.instrument_holding.map('{}_return'.format))) a_return b_return bc_ratio instrument_holding log_return 0 NaN NaN -0.165286 a NaN 1 0.996474 1.013166 -0.164637 a 0.996474 2 0.997730 0.993540 -0.170058 a 0.997730 3 1.024294 1.024318 -0.184530 a 1.024294 4 1.019071 1.047297 -0.148644 a 1.019071 5 0.992243 1.008210 -0.188752 a 0.992243 6 1.010331 1.039020 -0.098413 a 1.010331 7 0.989542 0.991899 0.025051 b 0.991899 8 1.005197 1.002527 -0.025051 b 1.002527 9 0.990755 1.002352 -0.099800 a 0.990755 10 1.006241 0.998375 -0.078643 b 0.998375 
+7
source share

One possibility is to use np.where provided that instrument_holding is "a" and returns the corresponding value in the a_return column if the condition is true, otherwise another column.

Use DF.assign to assign a new column later than log_ret.

 df.assign(log_ret=np.where(df.instrument_holding == 'a', df.a_return, df.b_return)) 

enter image description here

+7
source share

Use apply . This is not the most magical way, but it is very flexible.

 def select(row): if row['instrument_holding'] == 'a': return row['a_return'] else: return row['b_return'] df['log_ret'] = df.apply(select, axis=1) 
+3
source share

All Articles