I have a data framework that includes columns with multiple attributes, separated by commas:
df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]})
id labels 0 1 a,b,c 1 2 c,a 2 3 d,a,b
(I know this is not an ideal situation, but the data is taken from an external source.) I want to turn columns with several attributes into several columns, one for each label, so that I can treat them as categorical variables. Desired conclusion:
id abcd 0 1 True True True False 1 2 True False True False 2 3 True True False True
I can get a set of all possible attributes ( [a,b,c,d] ) quite easily, but I canβt determine how to determine if a given row has a particular attribute without iterating through the rows for each attribute. Is there a better way to do this?
source share