Pandas: convert lists from one column to several columns

I have a data framework that includes columns with multiple attributes, separated by commas:

df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]})

  id labels 0 1 a,b,c 1 2 c,a 2 3 d,a,b 

(I know this is not an ideal situation, but the data is taken from an external source.) I want to turn columns with several attributes into several columns, one for each label, so that I can treat them as categorical variables. Desired conclusion:

  id abcd 0 1 True True True False 1 2 True False True False 2 3 True True False True 

I can get a set of all possible attributes ( [a,b,c,d] ) quite easily, but I can’t determine how to determine if a given row has a particular attribute without iterating through the rows for each attribute. Is there a better way to do this?

+5
source share
1 answer

You can use get_dummies , pour 1 and 0 in boolean astype and the last concat id column:

 print df['labels'].str.get_dummies(sep=',').astype(bool) abcd 0 True True True False 1 True False True False 2 True True False True print pd.concat([df.id, df['labels'].str.get_dummies(sep=',').astype(bool)], axis=1) id abcd 0 1 True True True False 1 2 True False True False 2 3 True True False True 
+8
source

All Articles