Search for dates in a string

I am looking for a quick way in C # to find all dates in a string (a string is a large text, I have to scan about 200,000 different lines).

since there are many ways to write a date (e.g., 12/31/2012 or December 31, 2012 and much more), I use this regular expression (which should cover almost all common ways to write dates):

string findDates = "(?: (\ d {1,4}) - / .- /.)|(?:(\s\d{1,2►)\s+(jan(?:uary){0 1} \. {0,1} | feb (?: Ruary) {0,1} \ {0,1} | March (? H) {0,1} \ {0,1} | April (? Il) {0,1} \ {0,1} | may \. {0,1} | June (? E) {0,1} \ {0,1} | July (? Y) {0,1} \ { 0,1} | August (?: Mouth). {0,1} \ {0,1} | September: {0,1} \ {0,1} | October (September?). (?: Ober) { 0,1} \ {0,1}. | November (?: Coal) {0,1} \ {0,1} | December (?: Coal). {0,1} \ {0,1}) \ s + (\ d {2,4})) |: {0,1} \ {0,1} | February (:( January (uary?). (?: ruary) {0,1} \ {0, 1} | March (? H) {0,1} \ {0,1} | April. (? Il) {0,1} \ {0,1} | may \ {0,1} | June (? E ) {0, 1} \ {0,1} | July (? Y) {0,1} \ {0,1} | August (?: Mouth). {0,1} \ {0,1} | September (: tyabr) {0,1} \ {0,1} | October (:.? ober) {0,1} \ {0,1} | November (:.? ember) {0,1} \ {. 0,1} | December (?: Coal) {0,1} \ {0,1}) \ s + ([0-9] {1,2}) [\ s,] + (\ d {2, 4})) ";

tagged with RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace. In addition, I tried to precompile the regex to make it even faster.

The problem is that it is very slow (for some texts more than 2 seconds) Is there a better and more effective way to do this?

thank

+5
source share
2 answers

The expression looks good overall, as others have noted, it can be a little detailed with all {0,1}instead ?and (?:instead of applying RegexOptions.ExplicitCapture. But this should not slow down the expression. They only improve readability.

, , , , , . . , , , . , , , ((?>pattern) ( "" )).

, :

 (jan(?:uary){0,1}\.{0,1}|feb(?:ruary){0,1}\.{0,1}|mar(?:ch){0,1}\.{0,1}|apr(?:il){0,1}\.{0,1}|may\.{0,1}|jun(?:e){0,1}\.{0,1}|jul(?:y){0,1}\.{0,1}|aug(?:ust){0,1}\.{0,1}|sep(?:tember){0,1}\.{0,1}|oct(?:ober){0,1}\.{0,1}|nov(?:ember){0,1}\.{0,1}|dec(?:ember){0,1}\.{0,1})\s+(\d{2,4}))

:

 (?>jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|june?|july?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)\.?\s+(\d{2,4}))

, , .

, (?:(\d{1,4})- /.- /.) - , .

\ d {1,4} , - /.- /. , . , - :

 \d{1,4}[- /.]\d{1,2}[- /.]\d{1,2}

- . , , , .

Aliostad, , , , , DateTime.TryParseExact, .

, "" , . , Regex , | ? .

, , :

 \b\d{1,2}[- .\\/]\d{1,2}[- .\\/](\d{2}|\d{4})\b
 \b((jan|feb|mar|apr|jun|jul|aug|sep|oct|nov|dec)(.|[a-z]{0,10})|\d{1,2})[- .\\/,]\d{1,2}[- .\\/,](\d{2}|\d{4})\b

, , . , , , "sept", "sep", "september"

:).

: , , , \s +, , 20 000 , , . \s {1,20}, , , , .

+3

. -, . .

, , . 2 , ,


, , .

-, , , - , . , '\ d {1,2}\s *,\s *\d {4}', , , , , Jan (uary)/Feb (ruary)/Mar ()/....


: , .

, , , .

+3

All Articles