Pandas Aggregate / Group Based on Latest Date

Question

Pandas Aggregate / Group Based on Latest Date

I have a DataFrame as follows, where Id is the string and Date is the date and time:

Id Date 1 3-1-2012 1 4-8-2013 2 1-17-2013 2 5-4-2013 2 10-30-2012 3 1-3-2013

I would like to consolidate the table to show only one row for each identifier that has the most recent date.
Any thoughts on how to do this?

+7

python-2.7 pandas

Chrisarmrmrong Jun 10 '13 at 17:49

source share

2 answers

Andy hayden · Answer 1 · 2013-06-10T18:41:35+0000

You can groupby field id:

 In [11]: df Out[11]: Id Date 0 1 2012-03-01 00:00:00 1 1 2013-04-08 00:00:00 2 2 2013-01-17 00:00:00 3 2 2013-05-04 00:00:00 4 2 2012-10-30 00:00:00 5 3 2013-01-03 00:00:00 In [12]: g = df.groupby('Id')

If you are not sure about the order, you can do something line by line:

 In [13]: g.agg(lambda x: x.iloc[x.Date.argmax()]) Out[13]: Date Id 1 2013-04-08 00:00:00 2 2013-05-04 00:00:00 3 2013-01-03 00:00:00

which for each group captures the row with the largest (last) date (part of argmax).

If you knew that they were fine, you can take the last (or first) entry:

 In [14]: g.last() Out[14]: Date Id 1 2013-04-08 00:00:00 2 2012-10-30 00:00:00 3 2013-01-03 00:00:00

(Note: they are not OK, so in this case it does not work!)

head7 · Answer 2 · 2015-03-11T23:54:22+0000

In Hayden's answer, I believe that it is better to use x.loc instead of x.iloc, since the index of the df data frame can be sparse (in which case iloc will not work).

(I don't have enough points on stackoverflow to post it in response comments).

Pandas Aggregate / Group Based on Latest Date

More articles: