Python3 adds extra byte when printing hexadecimal values

Question

Python3 adds extra byte when printing hexadecimal values

I got a weird difference between Python2 and Python3. Printing the same list of characters gives an extra C2 byte when printing using Python3. I would expect the same behavior. Python2 behaves as I expected. What am I missing here?

$ python3 -c "print('\x30\xA0\x04\x08')" | xxd
0000000: 30c2 a004 080a     
$ python2 -c "print('\x30\xA0\x04\x08')" | xxd
0000000: 30a0 0408 0a

+4

python python-3.x python-2.x

Kai Feb 10 '15 at 10:13

source share

2 answers

In Python 3, all string literals are unicode.

\A0, UTF-8, no-break space:

U+00A0 (HTML ; ·  ) UTF-8 C2 A0

:

$ python3 -c "import sys; sys.stdout.buffer.write(b'\x30\xA0\x04\x08')" | xxd
0000000: 30a0 0408                                0...

+6

warvariuc 10 . '15 10:19

interjay · Accepted Answer · 2015-02-10T10:19:03+0000

Python 3 strings are Unicode, and on your platform, unicode is printed using UTF-8 encoding. The UTF-8 encoding for the Unicode character U + 00A0 is 0xC2 0xA0, which you see.

Python 2 strings are bytes, so they are output accurately.

Python3 adds extra byte when printing hexadecimal values

More articles: