Extract Original Message from Email

My application receives email from users. The response from gmail, for example, is as follows:

This is some new text On Sun, Apr 1, 2012 at 3:32 AM, My app < 4f77ed3860c258a567aeabf8@myapp.com > wrote: > Original... > message.. 

Of course, this treatment varies from client to client.

Now I identify β€œ4f77ed3860c258a567aeabf8” and throw away everything because I know which email address they sent. This is not a general solution, but works for my purposes, except when a line break occurs in the "Original Message" line, as in the example above.

Is there a better, standard way to strip a past message from a user’s response to an email address?

+7
source share
3 answers

If you need a 100% way to delete anything other than the last entry, compare each character with the new message and the previous one. If you do not want to write your own diff analyzer, check out this library.

https://github.com/cemerick/jsdifflib

Or, if you want a lightweight algo to verify this,

http://ejohn.org/projects/javascript-diff-algorithm/

+3
source

There is an npm module called emailreplyparser which is ported from the ruby ​​github library that does this. As you note, the formats used for this are not standard and, therefore, any solution will be rather fragile and imperfect, but whaddayagonnado?

Here is an example when I take the JSON response received from the new Gmail API and successfully access only the new response text of this message.

 var erp = require('emailreplyparser').EmailReplyParser.read; var message = require('./sample_message.json'); var buffer = new Buffer(message.payload.parts[0].body.data, 'base64'); var body = buffer.toString(); //body is the whole message, the new text and the quoted reply portion // console.log(body); var parsed = erp(body); //this has just the text of the reply itself console.log(parsed.fragments[0].content); 

Please note that there may be some interesting fragments if the author alternates the response text and fragments of the quoted message.

+3
source

please check my code, I think it covers all cases, because the repo contains an unhandled case if the message has more than one answer, and the line (On <Date> <Email> write :) is divided between several lines, it does not work correctly and includes this line (On <Date> <Email> wrote :) with it as part of the answer

 function getReplyOnly(str){ str = str || ''; var exp = /^(>)*\s*(On\s(\n|.)*wrote:)/m; var exp2 = /(\s|.|\n)*((wrote:)$)/m; var exp3 = /^((\s)*(On))/m; var arr = str.split('\n'); var msg = ''; var foundEndWrote = false; var foundStartOn = false; var indexes = []; var tempStr = ''; for(var i = arr.length - 1; i >= 0; i--){ tempStr = arr[i] + tempStr; if(exp2.test(arr[i])){ foundEndWrote = true; } if(exp2.test(arr[i])){ foundStartOn = true; } indexes.push(i); if(exp.test(tempStr) && foundEndWrote && foundStartOn){ clear(); } } function clear(){ tempStr = ''; indexes = []; foundEndWrote = false; foundStartOn = false; } // create the message for(var i = indexes.length - 1; i >= 0; i--){ msg += ('\n' + arr[indexes[i]]); } return msg; } 
0
source

All Articles