The correct way to decode an incoming email object (utf 8)

I am trying to pass my incoming letters to a PHP script so that I can store them in a database and other things. I use the MIME class email parser (registration required), although I do not consider this important.

I have a problem with the email subject. It works well when the title is in English, but if the subject uses non-Latin characters, I get something like

=?UTF-8?B?2KLYstmF2KfbjNi0?= 

for a title like یک دو سه

I decode the theme like this:

  $subject = str_replace('=?UTF-8?B?' , '' , $subject); $subject = str_replace('?=' , '' , $subject); $subject = base64_decode($subject); 

It works fine with short subjects, for example, with 10-15 characters, but with a longer heading, I get half the original heading with something like the end.

If the title is even longer, for example 30 characters, I won’t get anything. Am I doing it right?

+12
source share
6 answers

Despite the fact that this is almost a year ago, I found this and ran into a similar problem.

I'm not sure why you get odd characters, but maybe you are trying to display them somewhere where your encoding is not supported.

Here is the code I wrote that should handle everything except the charset conversion, which is a big problem that many libraries handle much better. (PHP MB library , for example)

 class mail { /** * If you change one of these, please check the other for fixes as well * * @const Pattern to match RFC 2047 charset encodings in mail headers */ const rfc2047header = '/=\?([^ ?]+)\?([BQbq])\?([^ ?]+)\?=/'; const rfc2047header_spaces = '/(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)\s+(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)/'; /** * http://www.rfc-archive.org/getrfc.php?rfc=2047 * * =?<charset>?<encoding>?<data>?= * * @param string $header */ public static function is_encoded_header($header) { // eg =?utf-8?q?Re=3a=20Support=3a=204D09EE9A=20=2d=20Re=3a=20Support=3a=204D078032=20=2d=20Wordpress=20Plugin?= // eg =?utf-8?q?Wordpress=20Plugin?= return preg_match(self::rfc2047header, $header) !== 0; } public static function header_charsets($header) { $matches = null; if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_PATTERN_ORDER)) { return array(); } return array_map('strtoupper', $matches[1]); } public static function decode_header($header) { $matches = null; /* Repair instances where two encodings are together and separated by a space (strip the spaces) */ $header = preg_replace(self::rfc2047header_spaces, "$1$2", $header); /* Now see if any encodings exist and match them */ if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_SET_ORDER)) { return $header; } foreach ($matches as $header_match) { list($match, $charset, $encoding, $data) = $header_match; $encoding = strtoupper($encoding); switch ($encoding) { case 'B': $data = base64_decode($data); break; case 'Q': $data = quoted_printable_decode(str_replace("_", " ", $data)); break; default: throw new Exception("preg_match_all is busted: didn't find B or Q in encoding $header"); } // This part needs to handle every charset switch (strtoupper($charset)) { case "UTF-8": break; default: /* Here where you should handle other character sets! */ throw new Exception("Unknown charset in header - time to write some code."); } $header = str_replace($match, $data, $header); } return $header; } } 

When run through a script and displayed in a browser using UTF-8, the result is:

آزمایش

You run it like this:

 $decoded = mail::decode_header("=?UTF-8?B?2KLYstmF2KfbjNi0?="); 
+13
source

You can use the mb_decode_mimeheader() function to decode your string.

+14
source

Use native php function

 <?php mb_decode_mimeheader($text); ?> 

This function can handle utf8 as well as the string iso-8859-1. I tested it.

+7
source

Use php function:

 <?php imap_utf8($text); ?> 
+3
source

Just add another way to do this (or if you don't have the mbstring extension installed but have iconv):

 iconv_mime_decode($str, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, 'UTF-8') 
0
source

Did the imap-mime-header-decode function help here?

Today I found myself in a similar situation.

http://www.php.net/manual/en/function.imap-mime-header-decode.php

-one
source

All Articles