I am creating a system that reads emails from a gmail account and retrieves objects using the Python imaplib and email modules. Sometimes emails received from a hotmail account have line breaks in their headers, for example:
In [4]: message['From'] Out[4]: '=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >'
If I try to decode this header, it does nothing:
In [5]: email.header.decode_header(message['From']) Out[5]: [('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=\r\n\t< isatocino22@hotmail.com >', None)]
However, if I replace the line break and the tab with a space, it works:
In [6]: email.header.decode_header(message['From'].replace('\r\n\t', ' ')) Out[6]: [('isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), ('< isatocino22@hotmail.com >', None)]
Is this a bug in decode_header ? If not, I would like to know what other special cases like this I should know.
source share