I have a vector of a character representation of dates, where the formats are mainly dmY (e.g., 09/27/2013), dmY (e.g., 09/27/13), and sometimes some b or b months. Thus, parse_date_time in the lubridate package, which "allows the user to specify multiple format orders to handle heterogeneous representations of date and time characters," can be a very useful feature for me.
However, it seems that parse_date_time has the problem of parsing dmY dates when they occur with dmY dates. When parsing dmY or dmY along with some other formats relevant to me, it works great. This template was also noted in the comment on @Peyton's answer here . A quick fix was suggested, but I want to ask if it can be processed in lubridate .
Here I show some examples where I try to dmY dates in dmY format along with some other formats and indicate orders accordingly.
library(lubridate) # version: lubridate_1.3.0 # regarding how date format is specified in 'orders': # examples in ?parse_date_time # parse_date_time(x, "ymd") # parse_date_time(x, "%y%m%d") # parse_date_time(x, "%y %m %d") # these order strings are equivalent and parses the same way # "Formatting orders might include arbitrary separators. These are discarded" # dmy date only parse_date_time(x = "27-09-13", orders = "dmy") # [1] "2013-09-27 UTC" # OK # dmy & dBY parse_date_time(c("27-09-13", "27 September 2013"), orders = c("dmy", "d BY")) # [1] "2013-09-27 UTC" "2013-09-27 UTC" # OK # dmy & dbY parse_date_time(c("27-09-13", "27 Sep 2013"), orders = c("dmy", "db Y")) # [1] "2013-09-27 UTC" "2013-09-27 UTC" # OK # dmy & dmY parse_date_time(c("27-09-13", "27-09-2013"), orders = c("dmy", "dm Y")) # [1] "0013-09-27 UTC" "2013-09-27 UTC" # not OK # does order of the date components matter? parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y md", "ymd")) # [1] "2013-09-27 UTC" "0013-09-27 UTC" # no
What about select_formats argument? I'm sorry to say that, but it's hard for me to understand this section of the help file. And search for select_formats on SO : 0 results. Nevertheless, this section looked relevant: "By default, formats with most tockens (%) formats are selected, and% Y - 2.5 tokens (so that it can take precedence over% y% m)." So I (desperately) tried with some additional dmY dates:
parse_date_time(c("27-09-2013", rep("27-09-13", 10)), orders = c("dmy", "dm Y")) # not OK. Tried also 100 dmy dates. # does order in the vector matter? parse_date_time(c(rep("27-09-13", 10), "27-09-2013"), orders = c("dmy", "dm Y")) # no
Then I checked how the guess_formats function (also in lubridate ) handles dmY along with dmY :
guess_formats(c("27-09-13", "27-09-2013"), c("dmy", "dmY"), print_matches = TRUE) # dmy dmY # [1,] "27-09-13" "%d-%m-%y" "" # [2,] "27-09-2013" "%d-%m-%Y" "%d-%m-%Y" # OK
From ?guess_formats : y also matches Y From ?parse_date_time : y* Year without century (00β99 or 0β99). Also matches year with century (Y format) y* Year without century (00β99 or 0β99). Also matches year with century (Y format) . So I tried:
guess_formats(c("27-09-13", "27-09-2013"), c("dmy"), print_matches = TRUE) # dmy # [1,] "27-09-13" "%d-%m-%y" # [2,] "27-09-2013" "%d-%m-%Y" # OK
So guess_format seems to be dealing with dmY along with dmY . But how can I tell parse_date_time to do the same? Thanks in advance for any comments or help.
Update I posted the question in the lubridate error message and received a quick response from @vitoshka: "This is an error."