Get the first and last value in a group

I have a dataframe df

 df = pd.DataFrame(np.arange(20).reshape(10, -1), [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']], ['X', 'Y']) 

How to get the first and last rows, grouped by the first index level?

I tried

 df.groupby(level=0).agg(['first', 'last']).stack() 

and received

  XY a first 0 1 last 6 7 b first 8 9 last 12 13 c first 14 15 last 16 17 d first 18 19 last 18 19 

This is so close to what I want. How to save a level 1 index and get it instead:

  XY aa 0 1 d 6 7 be 8 9 g 12 13 ch 14 15 i 16 17 dj 18 19 j 18 19 
+12
python pandas group-by dataframe pandas-groupby
source share
3 answers

Option 1

 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) 

enter image description here


Option 2 - only works if the index is unique

 idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack() df.loc[idx] 

Option 3 - in the notes below, this only makes sense when there is no NA

I also abused the agg function. The code below works, but much uglier.

 df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None]) 

Note

per @unutbu: agg(['first', 'last']) accept values ​​other than na.

I interpreted this as, then it will be necessary to run this column column by column. In addition, forcing the alignment of the index level = 1 may not even make sense.

Include another test

 df = pd.DataFrame(np.arange(20).reshape(10, -1), [list('aaaabbbccd'), list('abcdefghij')], list('XY')) df.loc[tuple('aa'), 'X'] = np.nan 

 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) 

enter image description here

 df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None]) 

enter image description here

Of course! This second decision takes the first valid value in column X. Now it makes no sense to make this value align with index a.

+14
source share

This may be one of the simple solutions.

 df.groupby(level = 0, as_index= False).nth([0,-1]) XY aa 0 1 d 6 7 be 8 9 g 12 13 ch 14 15 i 16 17 dj 18 19 

Hope this helps. (Y)

+3
source share

Please try this:

For the last value: df.groupby('Column_name').nth(-1) ,

For the first value: df.groupby('Column_name').nth(0)

0
source share

All Articles