Get the first and last value in a group

Question

Get the first and last value in a group

I have a dataframe df

 df = pd.DataFrame(np.arange(20).reshape(10, -1), [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']], ['X', 'Y'])

How to get the first and last rows, grouped by the first index level?

I tried

 df.groupby(level=0).agg(['first', 'last']).stack()

and received

  XY a first 0 1 last 6 7 b first 8 9 last 12 13 c first 14 15 last 16 17 d first 18 19 last 18 19

This is so close to what I want. How to save a level 1 index and get it instead:

  XY aa 0 1 d 6 7 be 8 9 g 12 13 ch 14 15 i 16 17 dj 18 19 j 18 19

+12

python pandas group-by dataframe pandas-groupby

Brian Aug 05 '16 at 20:23

source share

3 answers

This may be one of the simple solutions.

 df.groupby(level = 0, as_index= False).nth([0,-1]) XY aa 0 1 d 6 7 be 8 9 g 12 13 ch 14 15 i 16 17 dj 18 19

Hope this helps. (Y)

+3

Akarsh jain Aug 08 '18 at 15:31

source share

Please try this:

For the last value: df.groupby('Column_name').nth(-1) ,

For the first value: df.groupby('Column_name').nth(0)

0

nat23dip Jun 23 '19 at 23:30

source share

piRSquared · Accepted Answer · 2016-08-05T20:24:39+0000

Option 1

 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last)

Option 2 - only works if the index is unique

 idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack() df.loc[idx]

Option 3 - in the notes below, this only makes sense when there is no NA

I also abused the agg function. The code below works, but much uglier.

 df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None])

Note

per @unutbu: agg(['first', 'last']) accept values other than na.

I interpreted this as, then it will be necessary to run this column column by column. In addition, forcing the alignment of the index level = 1 may not even make sense.

Include another test

 df = pd.DataFrame(np.arange(20).reshape(10, -1), [list('aaaabbbccd'), list('abcdefghij')], list('XY')) df.loc[tuple('aa'), 'X'] = np.nan

 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last)

 df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \ .set_index('level_1', append=True).reset_index(1, drop=True) \ .rename_axis([None, None])

Of course! This second decision takes the first valid value in column X. Now it makes no sense to make this value align with index a.

Get the first and last value in a group

Option 1

Option 2 - only works if the index is unique

Option 3 - in the notes below, this only makes sense when there is no NA

Note

More articles: