I am trying to read a csv file in a pandas dataframe. However, csv contains accents. I am using Python 2.7
I came across UnicodeDecodeError because there is an accent in the first column. I read a bunch of sites like this SO question about UTF-8 in CSV files , this blog post about news CSV errors , and this blog post about UTF-8 in Python 2.7 .
I used the answers I found from there to try and modify my code. I initially had:
import pandas as pd
Excetera. This worked, but now switching to "NÍ" and "Nê" as the client name gives an error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 7: invalid continuation byte
I tried changing the line to df = pd.read_csv ('MYDATA.csv', encoding = 'utf-8') But this gives the same error.
So, I tried this from the sentences that I found while researching, but it also does not work, and I get the same error.
import pandas as pd import csv def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs): csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs) for row in csv_reader: yield [unicode(cell, 'utf-8') for cell in row] reader = unicode_csv_reader(open('MYDATA.csv','rU'), dialect = csv.reader)
It seems to me that it should not be difficult to read csv data into the pandas framework. Does anyone know an easier way?
Edit: It is really strange that if I delete the line with accented characters, I still get the error
UnicodeDecodeError: codec 'utf8' cannot decode byte 0xd0 at position 960: invalid continue byte.
This is strange since my csv test has 19 rows and 27 columns. But I hope that if I decode utf8 for all csv, it will fix the problem.