Format for ordinal dates?

Question

Format for ordinal dates?

Am I missing something? I cannot figure out how to convert the following to Date s:

 ord_dates <- c("September 1st, 2016", "September 2nd, 2016", "September 3rd, 2016", "September 4th, 2016")

?strptime does not appear to display the abbreviation for the ordinal suffix, and it is not automatically processed:

 as.Date(ord_dates, format = c("%B %d, %Y")) #[1] NA NA NA NA

Is there a token for handling ignored characters in the format argument? Is there no token?

Best I can come up with (maybe a shorter regex, but the same idea):

 as.Date(gsub("([0-9]+)(st|nd|rd|th)", "\\1", ord_dates), format = "%B %d, %Y") # [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

It seems that such data should be relatively common; Did I miss something?

+5

date r

MichaelChirico Aug 30 '16 at 21:23

source share

1 answer

thepule · Accepted Answer · 2016-08-30T21:29:17+0000

Enjoy the power of lubridate :

 library(lubridate) mdy(ord_dates) [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

Internally, lubridate does not have special conversion specifications that allow this. Rather, lubridate first uses (by smart fortune telling) the format "%B %dst, %Y" . This gets the first element of ord_dates .

He then checks the NA and repeats his clever guessing of the rest of the elements, dropping to "%B %dnd, %Y" to get the second element. This continues until NA left on the left (which happens in this case after 4 iterations), or until his intellectual guessing can not display the candidate of the likely format.

You can imagine that this makes lubridate slower, and it does - about half the speed of just using the intelligent regular expression suggested by @alistaire above:

 set.seed(109123) ord_dates <- sample( c("September 1st, 2016", "September 2nd, 2016", "September 3rd, 2016", "September 4th, 2016"), 1e6, TRUE ) library(microbenchmark) microbenchmark(times = 10L, lubridate = mdy(ord_dates), base = as.Date(sub("\\D+,", "", ord_dates), format = "%B %e %Y")) # Unit: seconds # expr min lq mean median uq max neval cld # lubridate 2.167957 2.219463 2.290950 2.252565 2.301725 2.587724 10 b # base 1.183970 1.224824 1.218642 1.227034 1.228324 1.229095 10 a

An obvious advantage in lubridate is its brevity and flexibility.

Format for ordinal dates?

More articles: