Source DataSet
In [2]: import pandas as pd ...: ...: # Original DataSet ...: d = {'A': [1,1,1,1,2,2,2,2,3], ...: 'B': ['a','a','a','x','b','b','b','x','c'], ...: 'C': [11,22,33,44,55,66,77,88,99],} ...: ...: df = pd.DataFrame(d) ...: df Out[2]: ABC 0 1 a 11 1 1 a 22 2 1 a 33 3 1 x 44 4 2 b 55 5 2 b 66 6 2 b 77 7 2 x 88 8 3 c 99
Given a data frame, I would like to have a flexible, efficient way to reset specific values based on certain conditions in two columns.
Terms:
- in column B: for any row with a value of "x",
- in column C: set the value of these row items to the value of the next row.
Desired Result
Out[3]: ABC 0 1 a 11 1 1 a 22 2 1 a 33 3 1 x 55 4 2 b 55 5 2 b 66 6 2 b 77 7 2 x 99 8 3 c 99
I found out that I can accomplish this using iterrows() (see below),
# Code that produces the above outcome for idx, x_row in df[df['B'] == 'x'].iterrows(): df.loc[idx, 'C'] = df.loc[idx+1, 'C'] df
but I need to do this many times, and I understand that iterrows() is slow . Are there any better pandas -y broadcast-facilitating ways to get the desired result more efficiently?