Should I add encoding = 'utf-8' to my Python registration handler?

Suppose I would like to process Unicode strings when registering with Python 2.7. It seems "correct" to add an encoding parameter to FileHandler.

# coding=utf-8 import logging logger = logging.getLogger() logger.addHandler(logging.FileHandler('my_log.txt', encoding='utf-8')) logger.error(u'Pão') logger.error('São') 

This has a few problems though:

  • It raises a UnicodeDecodeError in the UTF-8 string literal 'São'.
  • The output file has LF line endings on Windows when CRLF seems more appropriate.

If I don’t pass any encoding at all, I don’t have any of these problems. Both lines are written to a UTF-8 file, and I get the CRLF line ends. (I think the problem with line endings is due to opening the file in binary mode when the encoding is specified.)

Since lowering the encoding seems to work better, is there some reason why I am missing that I ever go into encoding='utf-8' ?

+7
python logging utf-8
source share
1 answer

If you pass the encoding to FileHandler , it uses codecs.open() with that encoding to open the file; otherwise, regular open() . To use all encoding .

Remember that Python 2.x is not ideal when handling bytes correctly and in Unicode: there is implicit encoding and decoding that occurs at different times, which can lift you up. You really shouldn't pass a string like "São" like bytes in most cases: if text, you should work with Unicode objects.

As for line endings, this usually translates into final lines for the platform, created by Python's I / O mechanisms for files. But if codecs.open() , the main file is opened in binary mode, so there is no translation of \n to \r\n , as is usually the case in Windows.

+2
source share

All Articles