Python Pandas groupby forloop & Idxmax

I have a DataFrame that needs to be grouped at three levels, and then the highest value will be returned. Every day there is a return for each unique value, and I would like to find the highest result and details.

data.groupby(['Company','Product','Industry'])['ROI'].idxmax() 

Return will show that:

 Target - Dish Soap - House had a 5% ROI on 9/17 Best Buy - CDs - Electronics had a 3% ROI on 9/3 

was the highest.

Here are some sample data:

 +----------+-----------+-------------+---------+-----+ | Industry | Product | Industry | Date | ROI | +----------+-----------+-------------+---------+-----+ | Target | Dish Soap | House | 9/17/13 | 5% | | Target | Dish Soap | House | 9/16/13 | 2% | | BestBuy | CDs | Electronics | 9/1/13 | 1% | | BestBuy | CDs | Electroincs | 9/3/13 | 3% | | ... 

Not sure if this will be a for loop or use .ix.

+1
python pandas for-loop
Sep 18 '13 at 18:35
source share
1 answer

I think if you understood correctly, you could collect index values ​​in a series using groupby and idxmax() , and then select these rows from df using loc :

 idx = data.groupby(['Company','Product','Industry'])['ROI'].idxmax() data.loc[idx] 

another option is to use reindex :

 data.reindex(idx) 

On the (other) data frame, it was convenient for me, it turned out that reindex can be faster:

 In [39]: %timeit df.reindex(idx) 10000 loops, best of 3: 121 us per loop In [40]: %timeit df.loc[idx] 10000 loops, best of 3: 147 us per loop 
+5
Sep 18 '13 at 18:41
source share



All Articles