R: count days starting at sunset

I analyze time patterns in a complex data set consisting of several environmental variables, as well as data on the activity of various animal species. These data were collected by several experimental setups, and data from each setup was stored once per minute. The project has been working for several years, so my data set is quite large.

The first few lines of one of my datasets look like this:

> head(setup_01) DateTime Film_number unused PIR Wheel Temperature LightOld LightDay LightNight LightUV IDnumbers error mouse shrew vole rat frog rest extra_info odour 1 2015-03-10 12:27:10 x 0 0 13.40 1471.34 -0.97 1331.29 700.42 no error 0 0 0 0 0 0 1 2 2015-03-10 12:28:10 x 0 0 13.43 1471.38 -1.07 1291.11 731.32 no error 0 0 0 0 0 0 1 3 2015-03-10 12:29:10 x 0 0 13.31 1471.24 -1.08 1368.57 1016.02 no error 0 0 0 0 0 0 1 

Since I want to associate these variables with various natural cycles, such as sunrise and sunset, throughout the season, I used the maptools package to calculate the sunrise and sunset times.

 library(maptools) gpclibPermit() #set coordinates crds=c(4.4900,52.1610) # download the sunrise/sunset/etc data setup_01$sunrise=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunrise") setup_01$sunset=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunset") #create a variable that 0 except at sunrise, and one that 0 except at sunset setup_01$sunrise_act=0 setup_01$sunset_act=0 setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunrise"]$time))<30,]$sunrise_act=1 setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunset"]$time))<30,]$sunset_act=1 

Since the behavior of most animals is different, depending on whether it is day or night, I used the sunset / sunrise times for the data to calculate a new variable that is 0 at night and 1 at day:

 #create a variable that 0 at night and 1 at daytime setup_01$daytime=0 setup_01[setup_01[,"DateTime"]>setup_01[,"sunrise"]$time & setup_01[,"DateTime"]<setup_01[,"sunset"]$time,]$daytime=1 

So far so good ... even with maptools you can use the start of civil / navigation / astronomical twilight and dawn instead of sunrise and sunset.

This, however, begins with my problem. I want to quote all the days of my experiment. And instead of increasing the daily counter at midnight , as is usually easy to do, I want to increase the counter of days at sunset (or, possibly, in future experiments, another moving time of the day like sunrise, sea dusk and dawn, ...). Since sunset does not happen at the same time every day, this is not for me - a direct problem to solve.

I just came up with for -loop, which is not very nice to do. In addition, given that I have more than 6 years of data points collected once a minute in several installations, I can sit and watch how the tectonic plates move and R runs a whole bunch of such loops:

 setup_01$day=0 day<-1 for(i in 1:nrow(setup_01)){ setup_01[i,]$day<-day if(setup_01[i,]$sunset_act==1){ day<-day+1 } } 

Besides ugliness and slowness, this code has one big problem: it does not deal with missing values. Sometimes, due to equipment failure, data was not recorded at all for several hours or days. If no data was recorded during sunset, the code above does not increase the daily counter. This means that I need to - in one way or another - include the date and time codes. It is easy to create a days variable from the moment the experiment begins:

 setup_01$daynumber<-as.integer(ceiling(difftime(setup_01$DateTime, setup_01$DateTime[1], units = "days"))) 

Perhaps these numbers can be used, perhaps in combination with rle good rle algorithm .

I used dput to make data for several months from one installation, including several large pieces of missing data, as well as newly created variables (as described in this post and in Heroke's) here .

I was looking for something better, nicer and especially faster, but could not come up with a good trick. I fiddled with a subset of my data framework, but I come to the conclusion that this is probably a stupid approach. I looked at maptools , lubridate and GeoLight . I searched google, qaru and various books like Hadley Wickham, the fantastic Advanced R. All to no avail. Maybe I'm missing something obvious. I hope someone here can help me.

+7
datetime r time-series lubridate maptools
source share
2 answers

I prefer a solution based on pre-calculated tables. This is slower, but I understand that it is understandable. Then I use dplyr to organize the information I need.

Let me show you what I mean. For example, I am creating a sunset time list. Of course, you will need to calculate the actual ones.

 library(dplyr) n.obs=1000 set.seed(10) t0 <- as.POSIXct('2015-03-08 18:00:00') artificial.sunsets <- data.frame(num.day= seq(0,n.obs+35)) %>% mutate(sunset=cumsum(rlnorm(length(num.day))*30)+t0 + 24*3600*num.day) 

artificial.sunsets contains the day number and the exact time of sunset, but may also contain more information about the day.

And some artificial data:

 t0 <- as.POSIXct('2015-03-10 12:27:10') test.data <- data.frame(DateTime=t0+ seq(0, n.obs*24*3600, by=3600), observation=rnorm(24*n.obs+1)) 

You can then find the previous sunset using:

 find.sunset.before <- function(x){ cbind(x,artificial.sunsets %>% filter(sunset < x$DateTime) %>% tail(.,n=1)) } data.with.sunset=test.data %>% rowwise() %>% do(find.sunset.before(.)) %>% ungroup()%>% mutate(rel.time = DateTime-sunset) head(data.with.sunset) 

As a result, the table will contain three more columns: 1) the corresponding day 2) the corresponding sunset time and 3) the time after sunset.

This should be reliable compared to missing measurements, since the numbering of days occurs in another table. You can also easily modify the algorithm to use different times and even apply several.

Update

all this can be done much faster using data.table:

 library(data.table) dt1 <- data.table(artificial.sunsets) dt2 <- data.table(test.data) dt1[,DateTime:=sunset] setkey(dt1, DateTime) setkey(dt2, DateTime) r <- dt1[dt2,roll=TRUE] r[,time.diff:=DateTime-sunset] 

I tried to synchronize it with system.time for 1000 observations - the previous one takes about 1 m, the data.table solution is 0.011 s.

+1
source share

I came up with a solution for the generated 0 and 1 (as you already created them) and it works with runlengths.

  #sunset/sunrise is series of 0 and 1 indicating night and daytime, so solution that works for random sequence #will work for OP dataset set.seed(10) sunset <- c(1,rbinom(20,1,0.5)) #counter needs to be x for sequence of 11111 (day) and 0000(night), and then increase when 0 reappears #counter starts at 1 #intermediate step: number each half-day rle_sunset <- rle(sunset) period <- rep(1:length(rle_sunset$lengths),rle_sunset$lengths) #calculate day so that each two subsequent periods are one day day <- ceiling(period/2) > cbind(sunset,period,day) sunset period day [1,] 1 1 1 [2,] 1 1 1 [3,] 0 2 1 [4,] 0 2 1 [5,] 1 3 2 [6,] 0 4 2 [7,] 0 4 2 [8,] 0 4 2 [9,] 0 4 2 [10,] 1 5 3 [11,] 0 6 3 [12,] 1 7 4 [13,] 1 7 4 [14,] 0 8 4 [15,] 1 9 5 [16,] 0 10 5 [17,] 0 10 5 [18,] 0 10 5 [19,] 0 10 5 [20,] 0 10 5 [21,] 1 11 6 
+3
source share

All Articles