Python pandas read_csv cannot read character twice specified twice

I am trying to create a two-column csv file (error.csv) with a sex delimited delimiter that contains double quotes:

col1;col2 2016-04-17_22:34:25.126;"Linux; Android" 2016-04-17_22:34:25.260;"{"g":2}iPhone; iPhone" 

And I am trying:

 logs = pd.read_csv('error.csv', na_values="null", sep=';', quotechar='"', quoting=0) 

I understand that the problem is with the double quotation mark ā€œgā€ inside my double quotes on line 3, but I cannot figure out how to deal with this. Any ideas?

+5
source share
1 answer

You probably need to pre-process the data to fit the expected CSV format. I doubt pandas handle this by simply changing a parameter or two.

If there are only two columns, and the first never contains a half-line, then you can split the lines in the first half of the colon:

 records = [] with open('error.csv', 'r') as fh: # first row is a header header = next(fh).strip().split(';') for rec in fh: # split only on the first semi-colon date, dat = rec.strip().split(';', maxsplit=1) # assemble records, removing quotes from the second column records.append((date, dat.strip('"'))) # create a data frame df = pandas.DataFrame.from_records(records, columns=header) 

You will have to manually analyze the dates yourself using the datetime module if you want the first column to contain the correct dates, not rows.

+1
source

All Articles