I have a pandas framework as shown below. For each identifier, I can have several names and sub-identifiers.
Id NAME SUB_ID 276956 A 5933 276956 B 5934 276956 C 5935 287266 D 1589
I want to condense the data frame in such a way that for each identifier there is only one row, and all the names and sub_id under each identifier appear as a singular set in this row
Id NAME SUB_ID 276956 set(A,B,C) set(5933,5934,5935) 287266 set(D) set(1589)
I tried grouping id and then aggregating over all other columns
df.groupby('Id').agg(lambda x: set(x))
But at the same time, the resulting framework does not have an Id column. When you do group identification, it returns as the first value of the tuple, but I think when you summarize what is lost. Is there any way to get the data I'm looking for. That is, for grouping and aggregation without losing the column that was grouped.
python pandas group-by dataframe
Fizi
source share