HeaderParseError in python

I get a HeaderParseError if I try to parse this line using decode_header () in python 2.6.5 (and 2.7). Here's the string expression ():

'=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=' 

This line is output from a mime message containing a JPEG image. Thunderbird can decode the file name (which contains the German umlauts).

 >>> from email.header import decode_header >>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header raise HeaderParseError email.errors.HeaderParseError 
+4
source share
1 answer

There seems to be an incompatibility between the Python character set for base64 encoded strings and the mail agent:

 >>> from email.header import decode_header >>> a='QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==' >>> decode_header(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/email/header.py", line 108, in decode_header raise HeaderParseError email.errors.HeaderParseError >>> a1= a.replace('_', '/') >>> decode_header(a1) [('Anmeldung Netzanschluss S\xecdring3p.jpg', 'iso-8859-1')] >>> print _[0][0].decode(_[0][1]) Anmeldung Netzanschluss SΓΌdring3p.jpg 

Python uses the character set that a Wikipedia article suggests (e.g. 0-9, AZ, az, +, /). The same article includes some alternatives (including underlining that the problem is here); however, the underscore value is undefined (this value is 62 or 63, depending on the alternative).

I do not know what Python can do to guess the intent of b0rken mail agents; therefore, I suggest that you make appropriate decode_header whenever decode_header fails.

I call a "broken" mail agent because there is no need to avoid + or / in the message header: this is not a URL, so why not use a typical character set?

+1
source

All Articles