Speed ​​up iloc solution in pandas frame

I have the following DataFrame :

 dates = pd.date_range('20150101', periods=4) df = pd.DataFrame({'A' : [5,10,3,4]}, index = dates) df.loc[:,'B'] = 0 df.loc[:,'C'] = 0 df.iloc[0,1] = 10 df.iloc[0,2] = 3 print df Out[69]: ABC 2015-01-01 5 10 3 2015-01-02 10 0 0 2015-01-03 3 0 0 2015-01-04 4 0 0 

I want to implement the following logic for columns B and C :

  • B(k+1) = B(k) - A(k+1)
  • C(k+1) = B(k) + A(k+1)

I can do this using the following code:

 for i in range (1, df.shape[0]): df.iloc[i,1] = df.iloc[i-1,1] - df.iloc[i,0] df.iloc[i,2] = df.iloc[i-1,1] + df.iloc[i,0] print df 

This gives:

  ABC 2015-01-01 5 10 3 2015-01-02 10 0 20 2015-01-03 3 -3 3 2015-01-04 4 -7 1 

What kind of answer am I looking for. The problem is that I apply this to a DataFrame when a large data array is slow. So slow. Is there a better way to achieve this?

+8
python pandas dataframe
source share
4 answers

Recursive things like this can be hard to draw. numba usually does a good job of them - if you need to redistribute your code, cython might be the best choice as it creates regular c-extensions without any additional dependencies.

 In [88]: import numba In [89]: @numba.jit(nopython=True) ...: def logic(a, b, c): ...: N = len(a) ...: out = np.zeros((N, 2), dtype=np.int64) ...: for i in range(N): ...: if i == 0: ...: out[i, 0] = b[i] ...: out[i, 1] = c[i] ...: else: ...: out[i, 0] = out[i-1,0] - a[i] ...: out[i, 1] = out[i-1,0] + a[i] ...: return out In [90]: logic(df.A.values, df.B.values, df.C.values) Out[90]: array([[10, 3], [ 0, 20], [-3, 3], [-7, 1]], dtype=int64) In [91]: df[['A','B']] = logic(df.A.values, df.B.values, df.C.values) 

Edit: As shown in other answers, this problem can actually be vectorized, which you probably should use.

+2
source share

The trick for vectorization is to rewrite everything as cumsums.

 In [11]: x = df["A"].shift(-1).cumsum().shift().fillna(0) In [12]: x Out[12]: 2015-01-01 0 2015-01-02 10 2015-01-03 13 2015-01-04 17 Name: A, dtype: float64 In [13]: df["B"].cumsum() - x Out[13]: 2015-01-01 10 2015-01-02 0 2015-01-03 -3 2015-01-04 -7 dtype: float64 In [14]: df["B"].cumsum() - x + 2 * df["A"] Out[14]: 2015-01-01 20 2015-01-02 20 2015-01-03 3 2015-01-04 1 dtype: float64 

Note. The first value is a special case, so you need to adjust it to 3.

+6
source share

This is basically your answer without a for loop:

 df['B'].iloc[1:] = df['B'].iloc[:-1].values - df['A'].iloc[1:].values df['C'].iloc[1:] = df['B'].iloc[:-1].values + df['A'].iloc[1:].values 

I don't know about performance issues, but I think that without a loop, it would be faster.

+1
source share

Complete solution:

 df1 = df[:1] df['B'] = df['B'].shift().cumsum()[1:] - df['A'][1:].cumsum() df[:1] = df1 df['C'] = df['B'].shift() + df['A'] df[:1] = df1 df ABC 2015-01-01 5 10 3 2015-01-02 10 0 20 2015-01-03 3 -3 3 2015-01-04 4 -7 1 
+1
source share

All Articles