Subtract Aggregate From Pandas Series / Dataframe

Question

Subtract Aggregate From Pandas Series / Dataframe

Given the following table

vals 0 20 1 3 2 2 3 10 4 20

I am trying to find a clean solution in pandas to subtract a value like 30 to complete the following result.

  vals 0 0 1 0 2 0 3 5 4 20

I was wondering if pandas had a solution to doing this that did not require a loop through all the rows in the data frame, which takes advantage of pandas bulk operations.

+5

python numpy pandas

jab May 18, '17 at 18:36

source share

3 answers

It uses NumPy with four lines of code -

 v = df.vals.values a = v.cumsum()-30 idx = (a>0).argmax()+1 v[:idx] = a.clip(min=0)[:idx]

Run Example -

 In [274]: df # Original df Out[274]: vals 0 20 1 3 2 2 3 10 4 20 In [275]: df.iloc[3,0] = 7 # Bringing in some variety In [276]: df Out[276]: vals 0 20 1 3 2 2 3 7 4 20 In [277]: v = df.vals.values ...: a = v.cumsum()-30 ...: idx = (a>0).argmax()+1 ...: v[:idx] = a.clip(min=0)[:idx] ...: In [278]: df Out[278]: vals 0 0 1 0 2 0 3 2 4 20

+4

Divakar May 18, '17 at 19:10

source share

 #A one-liner solution df['vals'] = df.assign(res = 30-df.vals.cumsum()).apply(lambda x: 0 if x.res>0 else x.vals if abs(x.res)>x.vals else x.vals-abs(x.res), axis=1) df Out[96]: vals 0 0 1 0 2 0 3 5 4 20

0

Allen May 18, '17 at 19:34

source share

piRSquared · Accepted Answer · 2017-05-18T18:50:19+0000

determine where cumsum is greater than or equal to 30
mask lines where it's not
reassign one line as cumsum less 30

 c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20

Same but numpy fied

 v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20

The timing

 %%timeit v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) 10000 loops, best of 3: 168 µs per loop %%timeit c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) 1000 loops, best of 3: 853 µs per loop

Subtract Aggregate From Pandas Series / Dataframe

More articles: