Pandas dataframe get the first row of each group

I have a pandas DataFrame as shown below.

 df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", "fifth","second","fifth","first", "first","second","third","fourth","fifth"]}) 

I want to group this by ["id", "value"] and get the first row of each group.

  id value 0 1 first 1 1 second 2 1 second 3 2 first 4 2 second 5 3 first 6 3 third 7 3 fourth 8 3 fifth 9 4 second 10 4 fifth 11 5 first 12 6 first 13 6 second 14 6 third 15 7 fourth 16 7 fifth 

Expected Result

  id value 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth 

I tried, after which it displays the first line of the DataFrame . Any help in this regard is appreciated.

 In [25]: for index, row in df.iterrows(): ....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0]) 
+90
python pandas dataframe
Nov 19 '13 at 9:24
source share
5 answers
 >>> df.groupby('id').first() value id 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth 

If you need id as a column:

 >>> df.groupby('id').first().reset_index() id value 0 1 first 1 2 first 2 3 first 3 4 second 4 5 first 5 6 first 6 7 fourth 

To get the n first entries, you can use head ():

 >>> df.groupby('id').head(2).reset_index(drop=True) id value 0 1 first 1 1 second 2 2 first 3 2 second 4 3 first 5 3 third 6 4 second 7 4 fifth 8 5 first 9 6 first 10 6 second 11 7 fourth 12 7 fifth 
+166
Nov 19 '13 at 9:25
source share

This will give you the second line of each group (zero is indexed, nth (0) matches the first ()):

 df.groupby('id').nth(1) 

Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

+35
Mar 18 '16 at 0:03
source share

I would suggest using .nth(0) instead of .first() if you need to get the first row.

The difference between the two is how they handle NaN, so .nth(0) will return the first row of the group regardless of the values ​​in that row, while .first() will ultimately return the first non- NaN value in each column.

For example, if your dataset is:

 df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4], 'value' : ["first","second","third", np.NaN, "second","first","second","third", "fourth","first","second"]}) >>> df.groupby('id').nth(0) value id 1 first 2 NaN 3 first 4 first 

As well as

 >>> df.groupby('id').first() value id 1 first 2 second 3 first 4 first 
+11
Mar 07 '18 at 9:54
source share

perhaps this is what you want

 import pandas as pd idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']]) df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx) 
  pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31 
 df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3) > Out[29]: pop state1 county3 65 county4 42 county2 15 state2 county1 78 county2 67 county3 55 
+4
Oct 28 '16 at 18:39
source share

If you only need the first row from each group, which we can do with drop_duplicates , pay attention to the default method for the keep='first' function.

 df.drop_duplicates('id') Out[1027]: id value 0 1 first 3 2 first 5 3 first 9 4 second 11 5 first 12 6 first 15 7 fourth 
+1
Mar 20 '19 at 21:01
source share



All Articles