Replacing = using '\ x' and then decoding in python

Question

Replacing = using '\ x' and then decoding in python

I got the email subject using python modules and the resulting string

'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?='

I know the string is encoded in 'utf-8'. Python has a method called strings to decode such strings. But to use the method, I needed to replace the = sign with the \x sign. By manually exchanging and then printing the decoded resulting string, I get the string سلام_کجائی, which is exactly what I want. The question is, how can I make an exchange automatically? The answer seems more complicated than just using functions in strings, such as a replace function.

Below is the code I used after manual control?

 r='\xD8\xB3\xD9\x84\xD8\xA7\xD9\x85_\xDA\xA9\xD8\xAC\xD8\xA7\xD8\xA6\xDB\x8C' print r.decode('utf-8')

I would be grateful for any workable idea.

+6

python utf-8 decode backslash

alexander Mar 24 '13 at 10:05

source share

2 answers

This encoding is known as quoted-printable. There is a Python module for performing encoding and decoding.

You are right that this is just pure quoting of binary strings, so you need to apply UTF-8 decoding afterwards. (Assuming the string is in UTF-8, but it looks right, although I don't know the language.)

 import quopri print quopri.decodestring( "'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?='" ).decode( "utf-8" )

+4

svk Mar 24 '13 at 22:11

source share

Pavel anossov · Accepted Answer · 2013-03-24T22:12:30+0000

Just decode it from the print quote to get utt8-encoded bytestring:

 In [35]: s = '=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?=' In [36]: s.decode('quoted-printable') Out[36]: '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85_\xda\xa9\xd8\xac\xd8\xa7\xd8\xa6\xdb\x8c?'

Then, if necessary, from utf-8 to unicode:

 In [37]: s.decode('quoted-printable').decode('utf8') Out[37]: u'\u0633\u0644\u0627\u0645_\u06a9\u062c\u0627\u0626\u06cc?'

 In [39]: print s.decode('quoted-printable') سلام_کجائی?

Replacing = using '\ x' and then decoding in python

More articles: