Pandas: groupby and aggregate without losing the column that was grouped

I have a pandas framework as shown below. For each identifier, I can have several names and sub-identifiers.

Id NAME SUB_ID 276956 A 5933 276956 B 5934 276956 C 5935 287266 D 1589 

I want to condense the data frame in such a way that for each identifier there is only one row, and all the names and sub_id under each identifier appear as a singular set in this row

 Id NAME SUB_ID 276956 set(A,B,C) set(5933,5934,5935) 287266 set(D) set(1589) 

I tried grouping id and then aggregating over all other columns

 df.groupby('Id').agg(lambda x: set(x)) 

But at the same time, the resulting framework does not have an Id column. When you do group identification, it returns as the first value of the tuple, but I think when you summarize what is lost. Is there any way to get the data I'm looking for. That is, for grouping and aggregation without losing the column that was grouped.

+8
python pandas group-by dataframe
source share
2 answers

If you do not want groupby to be an index, an argument exists for it to avoid further reset:

 df.groupby('Id', as_index=False).agg(lambda x: set(x)) 
+10
source share

The groupby column becomes an index. You can simply reset the index to return it:

 In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index() Out[4]: Id NAME SUB_ID 0 276956 {A, C, B} {5933, 5934, 5935} 1 287266 {D} {1589} 
+6
source share

All Articles