getdefaultencoding has nothing to do with the encoding of the source file or terminal. This is the encoding used to implicitly convert byte strings to Unicode strings and should always be "ascii" in Python 2.X ("utf8" in Python 3.X).
In Python 2.X, your line of code in a script without an encoding declaration produces an error:
SyntaxError: Non-ASCII character '\x87' in file ...
The actual non-ASCII character may be different, but it will not work without an encoding declaration. A coding declaration is required to use non-ASCII characters in Python 2.X. The encoding declaration must match the encoding of the source file. For instance:
# coding: utf8 value = 'ćèŻ'
when saved, since cp936 produces:
SyntaxError: 'utf8' codec can't decode byte 0x87 in position 9: invalid start byte
When the encoding is correct, the bytes in the byte string are literally in the source file, so they will contain encoded bytes of characters. When Python parses a Unicode string, the bytes are decoded using the declared Unicode source encoding. Note the difference when printing a UTF-8 byte string and a Unicode string on the cp936 console:
# coding: utf8 value = 'ćèŻ' print value,repr(value) value = u'ćèŻ' print value,repr(value)
Conclusion:
éŠćŹćœČ '\xe5\x9c\x8b\xe8\x8f\xaf'ćèŻ u'\u570b\u83ef'
The byte string contains three-byte UTF-8 encodings of two characters, but is displayed incorrectly because the sequence of bytes is not understood by the cp936 terminal. Unicode is printed correctly, and the line contains Unicode code points decoded from the UTF-8 bytes of the source file.
Pay attention to the difference when declaring and using the encoding that corresponds to the terminal:
# coding: cp936 value = 'ćèŻ' print value,repr(value) value = u'ćèŻ' print value,repr(value)
Conclusion:
ćèŻ '\x87\xf8\xc8A'ćèŻ u'\u570b\u83ef'
The contents of the byte string are now 2-byte cp936 encodings of two characters ("A" equivalent to "\ x41") and are displayed correctly since the terminal understands the cp936 byte sequence. The Unicode string contains the same Unicode code points for two characters as a previous example, because the source byte sequence was decoded using the declared Unicode source encoding.
If the script has the correct source encoding declaration and uses Unicode strings for text, it displays the correct characters 1 regardless of terminal encoding 2 . It will throw a UnicodeEncodeError if the terminal does not support the character and does not display the wrong character.
Final note: Python 2.X uses ascii encoding by default unless otherwise stated, and allows non-ASCII characters in byte strings if encoding supports them. Python 3.X uses the "utf8" encoding by default (so be sure to save it in this encoding or declare otherwise) and does not allow non-ASCII characters in byte strings.
1 If the terminal font supports the character.
2 If the terminal encoding supports the character.