How to read csv column as dtype list using pandas?

I have a csv file with 3 columns in which each row of column 3 contains a list of values. As can be seen from the following table structure

Col1,Col2,Col3 1,a1,"['Proj1', 'Proj2']" 2,a2,"['Proj3', 'Proj2']" 3,a3,"['Proj4', 'Proj1']" 4,a4,"['Proj3', 'Proj4']" 5,a5,"['Proj5', 'Proj2']" 

Whenever I try to read this csv, Col3 is read as a str object, not as a list. I tried to change the dtype of this column for the list, but got an "Attribute Error" as shown below

 df = pd.read_csv("inputfile.csv") df.Col3.dtype = list AttributeError Traceback (most recent call last) <ipython-input-19-6f9ec76b1b30> in <module>() ----> 1 df.Col3.dtype = list C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value) 1953 object.__setattr__(self, name, value) 1954 except (AttributeError, TypeError): -> 1955 object.__setattr__(self, name, value) 1956 1957 #---------------------------------------------------------------------- 

AttributeError: cannot set attribute

It would be great if you could help me how to do this.

+13
source share
3 answers

You can use ast lib:

 from ast import literal_eval df.Col3 = df.Col3.apply(literal_eval) print(df.Col3[0][0]) Proj1 

You can also do this when you create a dataframe from csv using converters :

 df = pd.read_csv("in.csv",converters={"Col3": literal_eval}) 

If you are sure that it is the same for all lines, deleting and splitting will be much faster:

  df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")}) 

But you will get quoted strings

+16
source

Adding a replacement to Cunninghams answer:

 df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").replace("'","").split(", ")}) 

See also pandas - converts a string to a list of strings

0
source

Try removing the brackets '[' and ']' from the column. Then use the python string split function to convert it to a list.

 df['Col3'] = df['Col3'].str.replace(']',"") df['Col3'] = df['Col3'].str.replace('[',"") df['Col3'] = df['Col3'].str.split() 
-1
source

All Articles