Replacing = using '\ x' and then decoding in python

I got the email subject using python modules and the resulting string

'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?=' 

I know the string is encoded in 'utf-8'. Python has a method called strings to decode such strings. But to use the method, I needed to replace the = sign with the \x sign. By manually exchanging and then printing the decoded resulting string, I get the string سلام_کجائی, which is exactly what I want. The question is, how can I make an exchange automatically? The answer seems more complicated than just using functions in strings, such as a replace function.

Below is the code I used after manual control?

 r='\xD8\xB3\xD9\x84\xD8\xA7\xD9\x85_\xDA\xA9\xD8\xAC\xD8\xA7\xD8\xA6\xDB\x8C' print r.decode('utf-8') 

I would be grateful for any workable idea.

+6
source share
2 answers

Just decode it from the print quote to get utt8-encoded bytestring:

 In [35]: s = '=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?=' In [36]: s.decode('quoted-printable') Out[36]: '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85_\xda\xa9\xd8\xac\xd8\xa7\xd8\xa6\xdb\x8c?' 

Then, if necessary, from utf-8 to unicode:

 In [37]: s.decode('quoted-printable').decode('utf8') Out[37]: u'\u0633\u0644\u0627\u0645_\u06a9\u062c\u0627\u0626\u06cc?' 

 In [39]: print s.decode('quoted-printable') سلام_کجائی? 
+7
source

This encoding is known as quoted-printable. There is a Python module for performing encoding and decoding.

You are right that this is just pure quoting of binary strings, so you need to apply UTF-8 decoding afterwards. (Assuming the string is in UTF-8, but it looks right, although I don't know the language.)

 import quopri print quopri.decodestring( "'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?='" ).decode( "utf-8" ) 
+4
source

All Articles