A smart (forgiving) date parser?

I need to transfer a very large dataset from one system to another. One of the source columns contains a date, but in fact it is a row without restriction, while the destination system sets the date in the format yyyy-mm-dd.

Many, but not all, source dates are formatted as yyyymmdd. Therefore, to force them to the expected format, I do (in Perl):

return "$1-$2-$3" if ($val =~ /(\d{4})[-\/]*(\d{2})[-\/]*(\d{2})/);

The problem occurs when the original dates are removed from the "generic" yyyymmdd. The goal is to save as many dates as possible before giving up. Examples of source lines:

3/21/1998, March 2004, 2001, 3/4/97

I can try to match so many examples that I can find with a sequence of regular expressions like the one above.

But is there something smarter? Don't I reinvent the wheel? Is there a library somewhere somewhere? I could not find anything suitable for the search engine "forgiving date parser". (any language is fine).

+5
source share
5 answers

Finally, I extracted a test case from over 200 examples of dates that actually appear in the data set. Some of them behave a little badly, some of them are completely painful (for example, "01010").

I tried all the existing Perl modules that I could find, but the success rate was too low. I eventually plunged into the newly invented wheel, achieving over 98% success.

, . , "". "" , - :

  • , . " " .

  • : , , . , "13" " " " ". "" "" . " ", , . , 2010 , 10.

  • . - , . .

  • (, , ), , . , ( , ). 7/3/2010 7, , . , .

  • , (, 8191 , ).

, , , .

+2
+4

Date::Manip - , , , Date_Init, 4 4.

(.. ), -, , - , . , , , 3/4/97, 21/3, , , .

vinko@mithril:~$ more date.pl
use strict;
use warnings;
use Date::Manip;

my @a;
push @a, "March 2004";
push @a, "2001";
push @a, "3/4/97";
push @a, "21/3/1998";
Date_Init("DateFormat=non-US");
for my $d (@a) {
    print "$d\n";
    print ParseDate($d)."\n";
};
vinko@mithril:~$ perl date.pl
March 2004
2004030100:00:00
2001
2001010100:00:00
3/4/97
1997040300:00:00
21/3/1998
1998032100:00:00
+4

DateTime::Format::Flexible

, :

- a " , ?", .

DateTime:: Format:: , , DateTime.

Vinko script, , . , (21/3/1998). Date::Manip, , (european => 1). Danbystrom , .

+1

perl, .NET /.

0

All Articles