Backback / shift in pandas data processing function

If I have the following data framework:

  date       A     B    M     S
 20150101    8     7    7.5   0
 20150101    10    9    9.5   -1
 20150102    9     8    8.5   1
 20150103    11    11   11    0
 20150104    11    10   10.5  0
 20150105    12    10   11    -1
 ...

If I want to create a different column value according to the following rules:

  • if S <0, cost = (MB) .shift (1) * S
  • if S> 0, cost = (MA) .shift (1) * S
  • if S == 0, cost = 0

I am currently using the following function:

def cost(df):
if df[3]<0:
    return np.roll((df[2]-df[1]),1)*df[3]
elif df[3]>0:
    return np.roll((df[2]-df[0]),1)*df[3]
else:
    return 0
df['cost']=df.apply(cost,axis=0)

Is there any other way to do this? can i somehow use the pandas shift function in custom functions? thank.

+4
source share
2 answers

This is usually expensive since you lose the advantage of vector speed when a applyfunction is set by the user. Instead, what about using the numpy version of the ternary operator :

import numpy as np

np.where(df[3] < 0,
    np.roll((df[2]-df[1]),1),
    np.where(df[3] > 0,
        np.roll((df[2]-df[0]),1)*df[3] 
        0))

(, df['cost']).

+5

np.where(condition, A, B) - NumPy,

A if condition else B

np.select(conditions, choices) np.where, , .

, , np.select,

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')
conditions = [S < 0, S > 0]
M, A, B, S = [df[col] for col in 'MABS']
choices = [(M-B).shift(1)*S, (M-A).shift(1)*S]
df['cost'] = np.select(conditions, choices, default=0)

       date   A   B     M  S  cost
0  20150101   8   7   7.5  0   0.0
1  20150101  10   9   9.5 -1  -0.5
2  20150102   9   8   8.5  1  -0.5
3  20150103  11  11  11.0  0   0.0
4  20150104  11  10  10.5  0   0.0
5  20150105  12  10  11.0 -1  -0.5
+5

All Articles