Pandas Aggregate / Group Based on Latest Date

I have a DataFrame as follows, where Id is the string and Date is the date and time:

Id Date 1 3-1-2012 1 4-8-2013 2 1-17-2013 2 5-4-2013 2 10-30-2012 3 1-3-2013 

I would like to consolidate the table to show only one row for each identifier that has the most recent date.
Any thoughts on how to do this?

+7
source share
2 answers

You can groupby field id:

 In [11]: df Out[11]: Id Date 0 1 2012-03-01 00:00:00 1 1 2013-04-08 00:00:00 2 2 2013-01-17 00:00:00 3 2 2013-05-04 00:00:00 4 2 2012-10-30 00:00:00 5 3 2013-01-03 00:00:00 In [12]: g = df.groupby('Id') 

If you are not sure about the order, you can do something line by line:

 In [13]: g.agg(lambda x: x.iloc[x.Date.argmax()]) Out[13]: Date Id 1 2013-04-08 00:00:00 2 2013-05-04 00:00:00 3 2013-01-03 00:00:00 

which for each group captures the row with the largest (last) date (part of argmax).

If you knew that they were fine, you can take the last (or first) entry:

 In [14]: g.last() Out[14]: Date Id 1 2013-04-08 00:00:00 2 2012-10-30 00:00:00 3 2013-01-03 00:00:00 

(Note: they are not OK, so in this case it does not work!)

+5
source

In Hayden's answer, I believe that it is better to use x.loc instead of x.iloc, since the index of the df data frame can be sparse (in which case iloc will not work).

(I don't have enough points on stackoverflow to post it in response comments).

+1
source

All Articles