Python email.header.decode_header not working for multi-line headers

Question

Python email.header.decode_header not working for multi-line headers

I am creating a system that reads emails from a gmail account and retrieves objects using the Python imaplib and email modules. Sometimes emails received from a hotmail account have line breaks in their headers, for example:

 In [4]: message['From'] Out[4]: '=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >'

If I try to decode this header, it does nothing:

 In [5]: email.header.decode_header(message['From']) Out[5]: [('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >', None)]

However, if I replace the line break and the tab with a space, it works:

 In [6]: email.header.decode_header(message['From'].replace('\r\n\t', ' ')) Out[6]: [('isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), ('< isatocino22@hotmail.com >', None)]

Is this a bug in decode_header ? If not, I would like to know what other special cases like this I should know.

+6

python email

José Tomás Tocino Dec 28 '13 at 16:30

source share

2 answers

This error still occurs in some versions of Python 2.7, so you can use the following workaround:

 >>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >'.replace('\r\n\t', ' ')) [('isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), ('< isatocino22@hotmail.com >', None)]

It replaces the CLRF and the tab for spaces. In this case, decode_header will correctly analyze the header.

0

Benjy malca Mar 07 '17 at 17:41

source share

Robᵩ · Accepted Answer · 2013-12-29T02:08:35+0000

This is a bug in decode_header , a bug which is present in python2.7 and fixed in python3.3.

 >>> sys.version_info sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0) >>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >') [(b'isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), (b'< isatocino22@hotmail.com >', None)]

vs

 >>> sys.version_info sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) >>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >') [('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >', None)]

Python email.header.decode_header not working for multi-line headers

More articles: