I have a script that processes fields in email headers that represent dates and times. Here are some examples of these lines:
Fri, 10 Jun 2011 11:04:17 +0200 (CEST) Tue, 1 Jun 2011 11:04:17 +0200 Wed, 8 Jul 1992 4:23:11 -0200 Wed, 8 Jul 1992 4:23:11 -0200 EST
Before I came across CEST / EST patches at the ends of some lines, I had everything that worked fine, just using datetime.datetime.strptime as follows:
msg['date'] = 'Wed, 8 Jul 1992 4:23:11 -0200' mail_date = datetime.datetime.strptime(msg['date'][:-6], '%a, %d %b %Y %H:%M:%S')
I tried putting the regular expression together to match the parts of the string date, excluding time zone information at the end, but I had problems with the regular expression (I could not match the colon).
Does regex use the best way to parse all the above examples? If so, can anyone share a regex that matches these examples? In the end, I want to have a datetime object.
source share