Merge pandas DataFrame rows with same identifier

Let's say I have a pandas DataFrame, for example:

AB id 0 1 1 0 1 2 1 0 2 3 2 1 3 0 2 1 

Let's say I want to combine rows with the same identifier so that the other elements in the rows are put together in a list, so that the above frame will look like this:

  AB id 0 [1, 2] [1, 1] 0 1 [3, 0] [2, 2] 1 

like the first two lines, and the last two lines have the same identifier. Does pandas function have a function? I know the pandas groupby command, but I would like the return type to also be a data framework. Thanks.

+6
source share
1 answer

You can use groupby to do this using the groupby tolist method and tolist the Pandas Series method:

 In [762]: df.groupby('id').agg(lambda x: x.tolist()) Out[762]: AB id 0 [1, 2] [1, 1] 1 [3, 0] [2, 2] 

groupby returns a Dataframe the way you want:

 In [763]: df1 = df.groupby('id').agg(lambda x: x.tolist()) In [764]: type(df1) Out[764]: pandas.core.frame.DataFrame 

To exactly match the expected result, you can reset_index or use as_index=False in groupby :

 In [768]: df.groupby('id', as_index=False).agg(lambda x: x.tolist()) Out[768]: id AB 0 0 [1, 2] [1, 1] 1 1 [3, 0] [2, 2] In [771]: df1.reset_index() Out[771]: id AB 0 0 [1, 2] [1, 1] 1 1 [3, 0] [2, 2] 
+7
source

All Articles