Subtract Aggregate From Pandas Series / Dataframe

Given the following table

vals 0 20 1 3 2 2 3 10 4 20 

I am trying to find a clean solution in pandas to subtract a value like 30 to complete the following result.

  vals 0 0 1 0 2 0 3 5 4 20 

I was wondering if pandas had a solution to doing this that did not require a loop through all the rows in the data frame, which takes advantage of pandas bulk operations.

+5
source share
3 answers
  • determine where cumsum is greater than or equal to 30
  • mask lines where it's not
  • reassign one line as cumsum less 30

 c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20 

Same but numpy fied

 v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20 

The timing

 %%timeit v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) 10000 loops, best of 3: 168 ยตs per loop %%timeit c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) 1000 loops, best of 3: 853 ยตs per loop 
+6
source

It uses NumPy with four lines of code -

 v = df.vals.values a = v.cumsum()-30 idx = (a>0).argmax()+1 v[:idx] = a.clip(min=0)[:idx] 

Run Example -

 In [274]: df # Original df Out[274]: vals 0 20 1 3 2 2 3 10 4 20 In [275]: df.iloc[3,0] = 7 # Bringing in some variety In [276]: df Out[276]: vals 0 20 1 3 2 2 3 7 4 20 In [277]: v = df.vals.values ...: a = v.cumsum()-30 ...: idx = (a>0).argmax()+1 ...: v[:idx] = a.clip(min=0)[:idx] ...: In [278]: df Out[278]: vals 0 0 1 0 2 0 3 2 4 20 
+4
source
 #A one-liner solution df['vals'] = df.assign(res = 30-df.vals.cumsum()).apply(lambda x: 0 if x.res>0 else x.vals if abs(x.res)>x.vals else x.vals-abs(x.res), axis=1) df Out[96]: vals 0 0 1 0 2 0 3 5 4 20 
0
source

All Articles