Groupby - taking the last element - how can I save nan?

I have df and I want to grab the last line below CUSIP.

In [374]: df.head() Out[374]: CUSIP COLA COLB COLC date 1992-05-08 AAA 238 4256 3.523346 1992-07-13 AAA NaN 4677 3.485577 1992-12-12 BBB 221 5150 3.24 1995-12-12 BBB 254 5150 3.25 1997-12-12 BBB 245 Nan 3.25 1998-12-12 CCC 234 5140 3.24145 1999-12-12 CCC 223 5120 3.65145 

I use:

 df = df.reset_index().groupby('CUSIP').last().reset_index.set_index('date') 

I want it:

  CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 Nan 3.25 1999-12-12 CCC 223 5120 3.65145 

Instead, I get:

  CUSIP COLA COLB COLC date 1992-07-13 AAA 238 4677 3.485577 1997-12-12 BBB 245 5150 3.25 1999-12-12 CCC 223 5120 3.65145 

How do I get last () to take the last line of a group, including NaN?

Thanks.

+6
source share
1 answer

You can do this directly using the application instead of the last one (and get the -1st line of each group):

 In [11]: df.reset_index().groupby('CUSIP').apply(lambda x: x.iloc[-1]).reset_index(drop=True).set_index('date') Out[11]: CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 NaN 3.250000 1999-12-12 CCC 223 5120 3.651450 [3 rows x 4 columns] 

In 0.13 (now rc), a faster and more direct way would be to use cumcount :

 In [12]: df[df.groupby('CUSIP').cumcount(ascending=False) == 0] Out[12]: CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 NaN 3.250000 1999-12-12 CCC 223 5120 3.651450 [3 rows x 4 columns] 
+4
source

All Articles