Convert hexadecimal string representation to actual bytes in Python

I need to load the third column of this text file as a hex string

http://www.netmite.com/android/mydroid/1.6/external/skia/emoji/gmojiraw.txt

>>> open('gmojiraw.txt').read().split('\n')[0].split('\t')[2] '\\xF3\\xBE\\x80\\x80' 

how to open a file so that I can get the third column as a hexadecimal string:

 '\xF3\xBE\x80\x80' 

I also tried binary mode and hex mode, without success.

+7
python hex
source share
5 answers

You can:

  • Remove \x -es
  • Use .decode ('hex') in the resulting string

the code:

 >>> '\\xF3\\xBE\\x80\\x80'.replace('\\x', '').decode('hex') '\xf3\xbe\x80\x80' 

Pay attention to the appropriate interpretation of backslashes. When the string representation is "\ xf3", this means a single-byte string with a byte value of 0xF3. When this is "\\ xf3", which is your input, it means the string consists of 4 characters: \ , x , f and 3

+7
source share

Quick response

 your_string.decode('string_escape') >>> a='\\xF3\\xBE\\x80\\x80' >>> a.decode('string_escape') '\xf3\xbe\x80\x80' >>> len(_) 4 

Bonus Information

 >>> u='\uDBB8\uDC03' >>> u.decode('unicode_escape') 

Some little things

Interestingly, I have Python 2.6.4 on Karmic Koala Ubuntu ( sys.maxunicode==1114111 ) and Python 2.6.5 on Gentoo ( sys.maxunicode==65535 ); on Ubuntu, the result of unicode_escape-decode is \uDBB8\uDC03 , and in Gentoo it is u'\U000fe003' , the correct length is 2. If this is not fixed between 2.6.4 and 2.6.5, I am impressed with the 2-byte entries, the Unicode character Gentoo version reports the correct character.

+7
source share

If you are using Python2.6 +, this is a safe way to use eval

 >>> from ast import literal_eval >>> item='\\xF3\\xBE\\x80\\x80' >>> literal_eval("'%s'"%item) '\xf3\xbe\x80\x80' 
+5
source share

If you trust the source, you can use eval('"%s"' % data)

0
source share

After removing "\ x" as an Eli response, you can simply do:

 int("F3BE8080",16) 
0
source share

All Articles