Python Pandas groupby forloop & Idxmax

Question

Python Pandas groupby forloop & Idxmax

I have a DataFrame that needs to be grouped at three levels, and then the highest value will be returned. Every day there is a return for each unique value, and I would like to find the highest result and details.

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

Return will show that:

 Target - Dish Soap - House had a 5% ROI on 9/17 Best Buy - CDs - Electronics had a 3% ROI on 9/3

was the highest.

Here are some sample data:

 +----------+-----------+-------------+---------+-----+ | Industry | Product | Industry | Date | ROI | +----------+-----------+-------------+---------+-----+ | Target | Dish Soap | House | 9/17/13 | 5% | | Target | Dish Soap | House | 9/16/13 | 2% | | BestBuy | CDs | Electronics | 9/1/13 | 1% | | BestBuy | CDs | Electroincs | 9/3/13 | 3% | | ...

Not sure if this will be a for loop or use .ix.

+1

python pandas for-loop

J_Arthur Sep 18 '13 at 18:35

source share

1 answer

unutbu · Answer 1 · 2013-09-18 18:41

I think if you understood correctly, you could collect index values in a series using groupby and idxmax() , and then select these rows from df using loc :

 idx = data.groupby(['Company','Product','Industry'])['ROI'].idxmax() data.loc[idx]

another option is to use reindex :

 data.reindex(idx)

On the (other) data frame, it was convenient for me, it turned out that reindex can be faster:

 In [39]: %timeit df.reindex(idx) 10000 loops, best of 3: 121 us per loop In [40]: %timeit df.loc[idx] 10000 loops, best of 3: 147 us per loop

Python Pandas groupby forloop & Idxmax

More articles: