This is how you get the contents of the ie * .eml email file. This works fine on Python2.5 - 2.7. Try on 3. It should also work.
from email import message_from_file import os # Path to directory where attachments will be stored: path = "./msgfiles" # To have attachments extracted into memory, change behaviour of 2 following functions: def file_exists (f): """Checks whether extracted file was extracted before.""" return os.path.exists(os.path.join(path, f)) def save_file (fn, cont): """Saves cont to a file fn""" file = open(os.path.join(path, fn), "wb") file.write(cont) file.close() def construct_name (id, fn): """Constructs a file name out of messages ID and packed file name""" id = id.split(".") id = id[0]+id[1] return id+"."+fn def disqo (s): """Removes double or single quotations.""" s = s.strip() if s.startswith("'") and s.endswith("'"): return s[1:-1] if s.startswith('"') and s.endswith('"'): return s[1:-1] return s def disgra (s): """Removes < and > from HTML-like tag or e-mail address or e-mail ID.""" s = s.strip() if s.startswith("<") and s.endswith(">"): return s[1:-1] return s def pullout (m, key): """Extracts content from an e-mail message. This works for multipart and nested multipart messages too. m -- email.Message() or mailbox.Message() key -- Initial message ID (some string) Returns tuple(Text, Html, Files, Parts) Text -- All text from all parts. Html -- All HTMLs from all parts Files -- Dictionary mapping extracted file to message ID it belongs to. Parts -- Number of parts in original message. """ Html = "" Text = "" Files = {} Parts = 0 if not m.is_multipart(): if m.get_filename(): # It an attachment fn = m.get_filename() cfn = construct_name(key, fn) Files[fn] = (cfn, None) if file_exists(cfn): return Text, Html, Files, 1 save_file(cfn, m.get_payload(decode=True)) return Text, Html, Files, 1 # Not an attachment! # See where this belongs. Text, Html or some other data: cp = m.get_content_type() if cp=="text/plain": Text += m.get_payload(decode=True) elif cp=="text/html": Html += m.get_payload(decode=True) else: # Something else! # Extract a message ID and a file name if there is one: # This is some packed file and name is contained in content-type header # instead of content-disposition header explicitly cp = m.get("content-type") try: id = disgra(m.get("content-id")) except: id = None # Find file name: o = cp.find("name=") if o==-1: return Text, Html, Files, 1 ox = cp.find(";", o) if ox==-1: ox = None o += 5; fn = cp[o:ox] fn = disqo(fn) cfn = construct_name(key, fn) Files[fn] = (cfn, id) if file_exists(cfn): return Text, Html, Files, 1 save_file(cfn, m.get_payload(decode=True)) return Text, Html, Files, 1 # This IS a multipart message. # So, we iterate over it and call pullout() recursively for each part. y = 0 while 1: # If we cannot get the payload, it means we hit the end: try: pl = m.get_payload(y) except: break # pl is a new Message object which goes back to pullout t, h, f, p = pullout(pl, key) Text += t; Html += h; Files.update(f); Parts += p y += 1 return Text, Html, Files, Parts def extract (msgfile, key): """Extracts all data from e-mail, including From, To, etc., and returns it as a dictionary. msgfile -- A file-like readable object key -- Some ID string for that particular Message. Can be a file name or anything. Returns dict() Keys: from, to, subject, date, text, html, parts[, files] Key files will be present only when message contained binary files. For more see __doc__ for pullout() and caption() functions. """ m = message_from_file(msgfile) From, To, Subject, Date = caption(m) Text, Html, Files, Parts = pullout(m, key) Text = Text.strip(); Html = Html.strip() msg = {"subject": Subject, "from": From, "to": To, "date": Date, "text": Text, "html": Html, "parts": Parts} if Files: msg["files"] = Files return msg def caption (origin): """Extracts: To, From, Subject and Date from email.Message() or mailbox.Message() origin -- Message() object Returns tuple(From, To, Subject, Date) If message doesn't contain one/more of them, the empty strings will be returned. """ Date = "" if origin.has_key("date"): Date = origin["date"].strip() From = "" if origin.has_key("from"): From = origin["from"].strip() To = "" if origin.has_key("to"): To = origin["to"].strip() Subject = "" if origin.has_key("subject"): Subject = origin["subject"].strip() return From, To, Subject, Date
# Usage: f = open("message.eml", "rb") print extract(f, f.name) f.close()
I programmed this for my mailgroup using a mailbox, so it is so confusing. It never let me down. Never junk. If the message is multiple, the output dictionary will contain the key βfilesβ (subdisk) with all the names of the extracted other files that were not text or html. This is a way to extract attachments and other binary data. You can change it in pullout () or just change the behavior of file_exists () and save_file ().
construct_name () creates a file name from the message identifier and a multi-page message file name, if any.
In pullout (), the Text and Html variables are strings. It was normal for an online mailgroup to get any text or HTML packed in multipart that was not an attachment right away.
If you need something more complicated, change Text and Html to lists and add to them and add them as needed. Nothing problematic.
Perhaps there are some errors here, as it is designed to work with mailbox.Message (), not email. Message(). I tried it by email. Message () and worked fine.
You said that you "want to list them all." Where from? If you are linking to a POP3 mailbox or the mailbox of some beautiful open source mail program, you do this using the mailbox module. If you want to list them with others, then you have a problem. For example, to receive mail from MS Outlook, you must know how to read OLE2 compound files. Other email programs rarely refer to them as * .eml files, so I think this is exactly what you would like to do. Then search PyPI for the olefile module or compound files and Google to receive email from the MS Outlook mailbox. Or save yourself a mess and just export them from there to some directory. When you will have them as eml files then apply this code.