Pandas groups and select the last in each group

How to group pandas data values โ€‹โ€‹and select the latest (by date) from each group?

For example, for data sorted by date:

id product date 0 220 6647 2014-09-01 1 220 6647 2014-09-03 2 220 6647 2014-10-16 3 826 3380 2014-11-11 4 826 3380 2014-12-09 5 826 3380 2015-05-19 6 901 4555 2014-09-01 7 901 4555 2014-10-05 8 901 4555 2014-11-01 

grouping by id or product, and choosing the earliest gives:

  id product date 2 220 6647 2014-10-16 5 826 3380 2015-05-19 8 901 4555 2014-11-01 
+29
python pandas group-by pandas-groupby
source share
4 answers

use idxmax in groupby and slice df with loc

 df.loc[df.groupby('id').date.idxmax()] id product date 2 220 6647 2014-10-16 5 826 3380 2015-05-19 8 901 4555 2014-11-01 
+25
source share

You can also use tail with groupby to get the last n values โ€‹โ€‹of the group:

 df.sort_values('date').groupby('id').tail(1) id product date 2 220 6647 2014-10-16 8 901 4555 2014-11-01 5 826 3380 2015-05-19 
+36
source share

To use .tail() as an aggregation method and keep your grouping intact:

 df.sort_values('date').groupby('id').apply(lambda x: x.tail(1)) id product date id 220 2 220 6647 2014-10-16 826 5 826 3380 2015-05-19 901 8 901 4555 2014-11-01 
0
source share

I had a similar problem and ended up using drop_duplicates and not groupby .

It seems to work much faster on large data sets compared to other methods suggested above.

 df.sort_values(by="date").drop_duplicates(subset=["id"], keep="last") id product date 2 220 6647 2014-10-16 8 901 4555 2014-11-01 5 826 3380 2015-05-19 
0
source share

All Articles