Why is en-dash written as "\ xe2 \ x80 \ x93" in Python?

In particular, what does each escape do \xe2\x80\x93and why does it need 3 exits? An attempt to decode by itself leads to an “unexpected end” error.

>>> print(b'\xe2\x80\x93'.decode('utf-8'))
>>> print(b'\xe2'.decode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 0: unexpected end of data
+4
source share
1 answer

You have UTF-8 bytes , which is a codec, a standard for representing text as machine-readable data. Code U + 2013 EN-DASH codepoint encodes these 3 bytes when encoding this codec.

, UTF-8, , UTF-8 . UTF-8 \xe2 U + 2000 U + 2FFF Unicode ( 2 ); 4095 .

Python bytes , , Python script . , , ASCII \xhh hex escape. , 0 255.

Hexadecimal - , 2 4 , 0 - F.

\xe2\x80\x93, , , E2, 80 93, 226, 128 147 . UTF-8 4 6 ( , , ). 4 + 6 + 6 == 16 2013 (0010 000000 010011 ).

, () Unicode; UTF-8 - , Unicode, . :

+15

All Articles