How to read Excel Unicode characters using Python

I get an Excel file whose contents I cannot influence. It contains some Unicode characters, such as "á" or "é".

My code has not changed, but I migrated from Eclipse Juno to LiClipse along with porting to another python package (2.6 of 2.5). In principle, the specific package that I use has a working version on the win32com package.

When I read the Excel file, my code crashes when fetching and converting to strings using str (). The console output is as follows:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 89: ordinal not in range(128)

More specifically, I do the following:

Read Excel:

  xlApp = Dispatch("Excel.Application")

  excel = xlApp.Workbooks.Open(excel_location)

in the inner loop, I retrieve the cell value:

cell_value = self.excel.ActiveSheet.Cells(excel_line + 1, excel_column + 1)

and finally, if I try to convert cell_value to str, it crashes:

print str(cell_value)

Excel , ASCII, . . , , , .

, , Excel, LiClipse 2.6 Python .

, ?

+4
4

Unicode UTF-8 Python 2.x. 2,4 2,7, , .

print: Python 2.x print , . , ascii ( , 0 127 , ).

COMObject . str - ( 0 255) Python 2.x. .

. Python , () UTF-8 (UTF-8 \xe1, , ; ).

, ascii : , .

, , .., print . :

s = str(cell_value) # Convert COM -> UTF-8 encoded string
print repr(s) # repr() converts anything to ascii

UTF-8, Python:

import sys
import codecs

sys.stdout = codecs.getwriter('utf8')(sys.stdout)

sys.stdout.encoding, , Python , / . Python 2 (, Linux), .

:

+3

.Cells(row,col) Range. , :

cell = xl.ActiveSheet.Cells(1,2).Text

cell = xl.ActiveSheet.Range('B1').Text

Unicode. , , .encode(encoding), :

bytes = cell.encode('utf8')

:

enter image description here

import win32com.client
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
xl.Workbooks.Open(r'book1.xlsx')
cell = xl.ActiveSheet.Cells(1,2)
cell_value = cell.Text
print repr(cell)
print repr(cell_value)
print cell_value

(. , /IDE ):

<win32com.gen_py.Microsoft Excel 14.0 Object Library.Range instance at 0x129909424>
u'\u4e2d\u56fd\u4eba'
中国人
+2

, , - , . , .

, , , @Huan-YuTseng , , , , , .

, , Eclipse Juno ( Pydev - Java, , ) LiClipse ( Eclipse).

LiClipse (1.4.0.201502042042) utf-8. LiClipse, . , , , . , :

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

. @AarongDigulla , .

, LiClipse sys.setdefaultencoding, ... , . , . , - LiClipse ( !)

+2

'utf-8 BOM', python utf_8_sig Unicode, , Excel.

0

All Articles