Large sample mbox file for testing

To develop a mail client, I need a very large mbox test file containing as many letters as possible. Preferably> 100,000 emails (> 10 GB).

This should be realistic mail data, since I not only want to check performance, but also mail filters and search.

Thanks in advance for any clues where you can get such things.

+4
source share
3 answers

You can collect .mbox text files using a search engine. For example, a Google search for filetype:mbox pipermail results in a lot of .mbox data. Instead pipermail from works as a search string.

Individual .mbox files can be combined:

 cat mboxfile1 > mboxfile echo >> mboxfile cat mboxfile2 >> mboxfile 

ps This is not data that is unethical, this is what you do with it. Please act ethically!

+3
source

Other options:

Enron Email Corpus , with 210 GB of messages. These are several email formats, but should be easy to read.

Enron's email data publicly released as part of the FERC Western Energy Markets study has been converted to standard EDRM formats. The data set consists of 1,227,256 letters with 493,384 attachments spanning 151 custodians. Email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.

Apache Software Foundation Public Messages Archive (200 GB)

A collection of all Apache Software Foundation publicly accessible mail archives as of July 11, 2011.

This collection contains all public email archives from ASF 80+ projects.

Amazon link

+3
source

Perhaps you can take your own inbox and repeat it several times. For instance. you set up your mail account and copy all emails several times using IMAP or using the file system, but it depends on what data format you use.

0
source

All Articles