Is there a reliable way to detect POSIXlt objects representing time that does not exist due to DST?

I have the following problem: the date column in the data I receive contains dates that do not exist due to daylight saving time. (For example, 2015-03-29 02:00 does not exist in Central European time, because the hours are set directly from 01:59 to 03:00, because the DST takes effect on this day)

Is there a simple and reliable way to determine if a date is valid for daylight saving time?

This is not trivial due to the properties of datetime classes.

# generating the invalid time as POSIXlt object test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET") # the object seems to represent something at least partially reasonable, notice the missing timezone specification though test # [1] "2015-03-29 02:00:00" # strangely enough this object is regarded as NA by is.na is.na(test) # [1] TRUE # which is no surprise if you consider: is.na.POSIXlt # function (x) # is.na(as.POSIXct(x)) as.POSIXct(test) # [1] NA # inspecting the interior of my POSIXlt object: unlist(test) # sec min hour mday mon year wday yday isdst zone gmtoff # "0" "0" "2" "29" "2" "115" "0" "87" "-1" "" NA 

So the easiest way is to check the isdst field of the isdst object, help for POSIXt describes the entry as follows:

isdst
Summer time. Positive, if valid, zero if not, negative if unknown.

Checks isdst field is isdst in the sense that this field is only -1 if the date is invalid due to dst changes or maybe -1 for other reasons?

Version, platform, and locale information

 R.version # _ # platform x86_64-w64-mingw32 # arch x86_64 # os mingw32 # system x86_64, mingw32 # status # major 3 # minor 3.1 # year 2016 # month 06 # day 21 # svn rev 70800 # language R # version.string R version 3.3.1 (2016-06-21) # nickname Bug in Your Hair Sys.getlocale() # [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" 
+7
datetime dst r
source share
2 answers

The manual states that strptime does not check if times exist in a specific time zone due to the switch to / from summer savings ( ?strptime ). The manual also indicates that as.POSIXct performs this check, so following the manual, you should check the resulting POSIXct object for NA ( ?asPOSIXct ), which identifies a non-existent time, as shown in the example question. The result, however, is OS dependent for times that exist twice in the time zone ( ?asPOSIXct ):

Remember that in most time zones, it doesn’t happen several times, and some happen twice due to transitions to / from “summer time” (also called “summer time”). strptime does not check for such moments (it does not accept a specific time zone), but the as.POSIXct conversion will do this.

and

One of the problems is what happens when you switch to DST and back, for example, in the UK

as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")) as.POSIXct(strptime("2010-10-31 01:30:00", "%Y-%m-%d %H:%M:%S"))

accordingly invalid (the hours went from 1:00 GMT to 2:00 BST) and ambiguous (the hours returned at 2:00 BST to 1:00 GMT). What happens in such cases depends on the OS: you should expect the first to be “NA”, and the second can be interpreted as BST or GMT (and common operating systems provide both possible values).

+1
source share

The value of as.POSIXct(test) seems to be platform dependent, adding a layer of complexity to get a reliable method. On my machine with a window (R 3.3.1), as.POSIXct(test) creates NA , as the OP also reports. However, on my Linux platform (same version of R), I get the following:

 times = c ("2015-03-29 01:00", "2015-03-29 02:00", "2015-03-29 03:00") test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET") test #[1] "2015-03-29 01:00:00 CET" "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST" as.POSIXct(test) #[1] "2015-03-29 01:00:00 CET" "2015-03-29 01:00:00 CET" "2015-03-29 03:00:00 CEST" as.character(test) #[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00" as.character(as.POSIXct(test)) #[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00" 

The only thing we can rely on is not the actual value of as.POSIXct(test) , but that it will differ from test when test is an invalid date / time:

 (as.character(test) == as.character(as.POSIXct(test))) %in% TRUE # TRUE FALSE TRUE 

I'm not sure that here as.character strictly necessary, but I only turn it on so as not to dump any other odd actions of POSIX objects.

+1
source share

All Articles