Pandas Problem with citing Read_CSV

I have a file that looks like this:

'colA'|'colB'
'word"A'|'A'
'word'B'|'B'

I want to use pd.read_csv('input.csv',sep='|', quotechar="'"), but I get the following output:

colA    colB
word"A   A
wordB'   B

The last line is incorrect, it should be word'B B. How do I get around this? I tried various iterations, but none of them says it reads both lines correctly. I need csv reading experience!

+4
source share
2 answers

I think you need str.stripwith apply:

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B
+4
source

The source of the problem is that 'is defined as a quote and as a regular char.

You can avoid this, for example.

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

: quoting = csv.QUOTE_ALL -

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>
+2

All Articles