I analyze time patterns in a complex data set consisting of several environmental variables, as well as data on the activity of various animal species. These data were collected by several experimental setups, and data from each setup was stored once per minute. The project has been working for several years, so my data set is quite large.
The first few lines of one of my datasets look like this:
> head(setup_01) DateTime Film_number unused PIR Wheel Temperature LightOld LightDay LightNight LightUV IDnumbers error mouse shrew vole rat frog rest extra_info odour 1 2015-03-10 12:27:10 x 0 0 13.40 1471.34 -0.97 1331.29 700.42 no error 0 0 0 0 0 0 1 2 2015-03-10 12:28:10 x 0 0 13.43 1471.38 -1.07 1291.11 731.32 no error 0 0 0 0 0 0 1 3 2015-03-10 12:29:10 x 0 0 13.31 1471.24 -1.08 1368.57 1016.02 no error 0 0 0 0 0 0 1
Since I want to associate these variables with various natural cycles, such as sunrise and sunset, throughout the season, I used the maptools package to calculate the sunrise and sunset times.
library(maptools) gpclibPermit() #set coordinates crds=c(4.4900,52.1610) # download the sunrise/sunset/etc data setup_01$sunrise=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunrise") setup_01$sunset=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunset") #create a variable that 0 except at sunrise, and one that 0 except at sunset setup_01$sunrise_act=0 setup_01$sunset_act=0 setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunrise"]$time))<30,]$sunrise_act=1 setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunset"]$time))<30,]$sunset_act=1
Since the behavior of most animals is different, depending on whether it is day or night, I used the sunset / sunrise times for the data to calculate a new variable that is 0 at night and 1 at day:
#create a variable that 0 at night and 1 at daytime setup_01$daytime=0 setup_01[setup_01[,"DateTime"]>setup_01[,"sunrise"]$time & setup_01[,"DateTime"]<setup_01[,"sunset"]$time,]$daytime=1
So far so good ... even with maptools you can use the start of civil / navigation / astronomical twilight and dawn instead of sunrise and sunset.
This, however, begins with my problem. I want to quote all the days of my experiment. And instead of increasing the daily counter at midnight , as is usually easy to do, I want to increase the counter of days at sunset (or, possibly, in future experiments, another moving time of the day like sunrise, sea dusk and dawn, ...). Since sunset does not happen at the same time every day, this is not for me - a direct problem to solve.
I just came up with for -loop, which is not very nice to do. In addition, given that I have more than 6 years of data points collected once a minute in several installations, I can sit and watch how the tectonic plates move and R runs a whole bunch of such loops:
setup_01$day=0 day<-1 for(i in 1:nrow(setup_01)){ setup_01[i,]$day<-day if(setup_01[i,]$sunset_act==1){ day<-day+1 } }
Besides ugliness and slowness, this code has one big problem: it does not deal with missing values. Sometimes, due to equipment failure, data was not recorded at all for several hours or days. If no data was recorded during sunset, the code above does not increase the daily counter. This means that I need to - in one way or another - include the date and time codes. It is easy to create a days variable from the moment the experiment begins:
setup_01$daynumber<-as.integer(ceiling(difftime(setup_01$DateTime, setup_01$DateTime[1], units = "days")))
Perhaps these numbers can be used, perhaps in combination with rle good rle algorithm .
I used dput to make data for several months from one installation, including several large pieces of missing data, as well as newly created variables (as described in this post and in Heroke's) here .
I was looking for something better, nicer and especially faster, but could not come up with a good trick. I fiddled with a subset of my data framework, but I come to the conclusion that this is probably a stupid approach. I looked at maptools , lubridate and GeoLight . I searched google, qaru and various books like Hadley Wickham, the fantastic Advanced R. All to no avail. Maybe I'm missing something obvious. I hope someone here can help me.