UnicodeEncodeError when using os.listdir

  • OS: Windows 7 64-bit
  • Python 3.1.3

When i try to do it

os.listdir("F:\\music") 

I get it

 UnicodeEncodeError: 'gbk' codec can't encode character '\xe3' in position 643: illegal multibyte sequence 

os.listdir works with other directories, so the cause of the problem is obviously some strange encoded file or folder inside F:\music . How to find the source of this error?

+1
python
source share
2 answers

UnicodeEncodeError indicates that you are trying to print file names. If he had a problem with os.lisdir() , you should see UnicodeDecodeError (Decode, not Encode).

Since you use the Unicode path name, os.listdir() returns easily decoded file names; on Windows, the file system uses UTF-16 to encode file names, and they are easily decoded in Python ( sys.getfilesystemencoding() tells Python which codec to use).

However, the Windows console uses a different encoding; in your case gbk , and this codec cannot display all the different characters that UTF-16 can encode.

Here you are looking for the print() statement. Perhaps you could use print(filename.encode('gbk', errors='replace')) to try and print the file names; unprintable characters will be replaced by a question mark.

Alternatively, you can use b'F:\\music' as a path and work with raw byte names instead of Unicode.

0
source share

This is a unicode problem in the Windows console, you can fix it by installing the win-unicode-console library

 $ pip install win-unicode-console $ edit a.py import win_unicode_console win_unicode_console.enable() print('non-gbk-character Résumé or 欧•亨利 works') 

I tested in python 3.4 in chinese windows 8

+1
source share

All Articles