Retrieving Email Content

I need to create an application that will extract the VAT numbers that our customers send us for verification. They don’t email anything else. This is for the purpose of creating advanced statistics.

I need to have a mailbox without any headers before the content I need, that is, the VAT number, is just as simple.

This is my script that creates a list of the last 30 emails:

<? if (!function_exists('imap_open')) { die('No function'); } if ($mbox = imap_open(<confidential>)) { $output = ""; $messageCount = imap_num_msg($mbox); $x = 1; for ($i = 0; $i < 30; $i++) { $message_id = ($messageCount - $i); $fetch_message = imap_header($mbox, $message_id); $mail_content = quoted_printable_decode(imap_fetchbody($mbox,$message_id, 1)); iconv(mb_detect_encoding($mail_content, mb_detect_order(), true), "UTF-8", $mail_content); $output .= "<tr> <td>".$x.".</td> <td> ".$fetch_message->from[0]->mailbox."@".$fetch_message->from[0]->host." </td> <td> ".$fetch_message->date." </td> <td> ".$fetch_message->subject." </td> <td> <textarea cols=\"40\">".$mail_content."</textarea> </td> </tr>"; $x++; } $smarty->assign("enquiries", $output); $smarty->display("module_mail"); imap_close($mbox); } else { print_r(imap_errors()); } ?> 

I worked with imap_fetchbody, imap_header, etc. to get the desired content, but it turns out that most email messages have something else (like headers) before the content, i.e.

 --=-Dbl2eWTUl0Km+Tj46Ww1 Content-Type: text/plain; ------=_NextPart_001_003A_01D14F7A.F25AB3D0 Content-Type: text/plain; --=-ucRIRGamiKb0Ot1/AkNc Content-Type: text/plain; 

I need to get rid of everything that is up to the VAT number included in the mail, but I do not know how to do it. Some emails do not have these headers, some do. And since we work with clients from all over Europe, it really confuses me and leaves me powerless.

Another problem is that some customers simply copy VAT numbers from different sites, and this means that these VAT numbers are often inserted with the original style (bold / background / color change, etc.). This may be the reason for my PS below.

I would be grateful for any help that would lead me to solve this problem.

Thanks in advance.

PS. For recording only. With imap_fetchbody($mbox,$message_id, 1) I need to use 1 to get all the content. Changing 1 to everything else results in the display of NO email content in general. In a literal sense.

+7
php email imap
source share
2 answers

The part of the letter that you define as β€œnoise” is part of the letter format.
In a way, it looks like you were reading the html code of a web page.

All these bits of the border . These email elements are similar to tags in html and as html they start and they close.

So in your case:

 Content-Type: multipart/alternative; boundary="=-Dbl2eWTUl0Km+Tj46Ww1" // define type of email structure and boudary --=-Dbl2eWTUl0Km+Tj46Ww1 // used to start the section Content-Type: text/plain; // to define the type of content of the section // here there is your VAT presumbly --=-Dbl2eWTUl0Km+Tj46Ww1-- // used to close the section 

Possible solutions

In fact, you have at least 2 solutions. Create your own parser yourself or use the PECL library called Mailparse .

Manually make a parser:

 $mail_lines = explode($mail_content, "\n"); foreach ($mail_lines as $key => $line) { // jump most of the headrs if ($key < 5) { continue; } // skip tag lines if (strpos($line, "--")) { continue; } // skip Content lines if (strpos($line, "Content")) { continue; } if (empty(trim($line))) { continue; } //////////////////////////////////////////////////// // here you have to insert the logic for the parser // and extend the guard clauses //////////////////////////////////////////////////// } 

Mailparse:

Install Mail parse sudo pecl install mailparse .

Extract VAT:

 $mail = mailparse_msg_create(); mailparse_msg_parse($mail, $mail_content); $struct = mailparse_msg_get_structure($mail); foreach ($struct as $st) { $section = mailparse_msg_get_part($mail, $st); $info = mailparse_msg_get_part_data($section); print_r($info); } 
+3
source share

You need to use imap_fetchstructure() to find the text part of the mail.

The following code can provide you with the section number of the text/plain subcategory (for example, "1.1")

  function getTextPart($struct) { if ($struct->type==0) return "1"; if ($struct->type==1) { $num=1; foreach ($struct->parts as $part) { if (($part->type==0)&&($part->subtype="PLAIN")) { return $num; } else if ($part->type==1) { $found=getTextPart($part); if ($found) return "$num.$found"; } $num++; } } return NULL; } 

Usage example:

 if ($imap) { $messageCount = imap_num_msg($imap); for ($i = 1; $i < 30; $i++) { $struct=imap_fetchstructure($imap, $i); $part=getTextPart($struct); $body=imap_fetchbody($imap, $i, $part); print_r($body); } } 
0
source share

All Articles