Python2.7, what do special characters mean in utf-32 encoding of a unicode string?

Question

Python2.7, what do special characters mean in utf-32 encoding of a unicode string?

I played with Unicode and python encoding methods, I used the special "‽" character and Chinese character to see how the different utf encoding is related to these characters, and I get this output.

>>> a = u"‽"
>>> encoded_a = a.encode('utf-32')
>>> a
u'\u203d'
>>> encoded_a
'\xff\xfe\x00\x00= \x00\x00'
>>> b = u"安"
>>> encoded_b = b.encode('utf-32')
>>> b
u'\u5b89'
>>> encoded_b
'\xff\xfe\x00\x00\x89[\x00\x00'

My question is, what does the equal sign and bracket equare mean in the encoded result?

+4

python encoding unicode utf

Yang zheng May 18, '16 at 19:30

source share

3 answers

"\xff\xfe\x00\x00" - , (BOM). , Python , , , , UTF-32.

3d, 20 , 203d . 3d, ASCII, , 20 .

+2

Ulrich Eckhardt 18 '16 21:41

The first two hexadecimal encodings represent the specification or byte byte character. Looking at the Python documentation for Unicode , it seems that the characters you see are translations of hexadecimal encoding. I look at one of the examples given in the documentation, which seems to do the same as you, and print the translation:

8 >>> unistring.encode('utf-16')
9 '\xff\xfeH\x00i\x00\n\x00'

0

K. Erik Wolfe May 18, '16 at 20:15

source share

Mark Ransom · Accepted Answer · 2016-05-18T21:56:49+0000

repr , \x20 \x7e ASCII . = \x3d, [ \x5b. , \x20.

:

\x20 ' '    \x21 '!'    \x22 '"'    \x23 '#'
\x24 '$'    \x25 '%'    \x26 '&'    \x27 '''
\x28 '('    \x29 ')'    \x2a '*'    \x2b '+'
\x2c ','    \x2d '-'    \x2e '.'    \x2f '/'
\x30 '0'    \x31 '1'    \x32 '2'    \x33 '3'
\x34 '4'    \x35 '5'    \x36 '6'    \x37 '7'
\x38 '8'    \x39 '9'    \x3a ':'    \x3b ';'
\x3c '<'    \x3d '='    \x3e '>'    \x3f '?'
\x40 '@'    \x41 'A'    \x42 'B'    \x43 'C'
\x44 'D'    \x45 'E'    \x46 'F'    \x47 'G'
\x48 'H'    \x49 'I'    \x4a 'J'    \x4b 'K'
\x4c 'L'    \x4d 'M'    \x4e 'N'    \x4f 'O'
\x50 'P'    \x51 'Q'    \x52 'R'    \x53 'S'
\x54 'T'    \x55 'U'    \x56 'V'    \x57 'W'
\x58 'X'    \x59 'Y'    \x5a 'Z'    \x5b '['
\x5c '\'    \x5d ']'    \x5e '^'    \x5f '_'
\x60 '`'    \x61 'a'    \x62 'b'    \x63 'c'
\x64 'd'    \x65 'e'    \x66 'f'    \x67 'g'
\x68 'h'    \x69 'i'    \x6a 'j'    \x6b 'k'
\x6c 'l'    \x6d 'm'    \x6e 'n'    \x6f 'o'
\x70 'p'    \x71 'q'    \x72 'r'    \x73 's'
\x74 't'    \x75 'u'    \x76 'v'    \x77 'w'
\x78 'x'    \x79 'y'    \x7a 'z'    \x7b '{'
\x7c '|'    \x7d '}'    \x7e '~'

'\xff\xfe\x00\x00\x3d\x20\x00\x00' '\xff\xfe\x00\x00\x89\x5b\x00\x00'.

Python2.7, what do special characters mean in utf-32 encoding of a unicode string?

More articles: