I have a CSV file with timestamps and certain types of events that occurred at this time. I want to count the number of occurrences of certain types of events at 6 minute intervals.
Input data is as follows:
date,type "Sep 22, 2011 12:54:53.081240000","2" "Sep 22, 2011 12:54:53.083493000","2" "Sep 22, 2011 12:54:53.084025000","2" "Sep 22, 2011 12:54:53.086493000","2"
I load and cure data using this piece of code:
> raw_data <- read.csv('input.csv') > cured_dates <- c(strptime(raw_data$date, '%b %d, %Y %H:%M:%S', tz="CEST")) > cured_data <- data.frame(cured_dates, c(raw_data$type)) > colnames(cured_data) <- c('date', 'type')
After cure, the data is as follows:
> head(cured_data) date type 1 2011-09-22 14:54:53 2 2 2011-09-22 14:54:53 2 3 2011-09-22 14:54:53 2 4 2011-09-22 14:54:53 2 5 2011-09-22 14:54:53 1 6 2011-09-22 14:54:53 1
I read a lot of samples for xts and zoo, but for some reason I can not hang on it. The output should look something like this:
date type count 2011-09-22 14:54:00 CEST 1 11 2011-09-22 14:54:00 CEST 2 19 2011-09-22 15:00:00 CEST 1 9 2011-09-22 15:00:00 CEST 2 12 2011-09-22 15:06:00 CEST 1 23 2011-09-22 15:06:00 CEST 2 18
The Zoo aggregation function looks promising, I found this piece of code:
# aggregate POSIXct seconds data every 10 minutes tt <- seq(10, 2000, 10) x <- zoo(tt, structure(tt, class = c("POSIXt", "POSIXct"))) aggregate(x, time(x) - as.numeric(time(x)) %% 600, mean)
Now I'm just wondering how I can apply this in my use case.
Naive, as I tried:
> zoo_data <- zoo(cured_data$type, structure(cured_data$time, class = c("POSIXt", "POSIXct"))) > aggr_data = aggregate(zoo_data$type, time(zoo_data$time), - as.numeric(time(zoo_data$time)) %% 360, count) Error in `$.zoo`(zoo_data, type) : not possible for univariate zoo series
I have to admit that I'm not sure about R, but I try. :-)
I'm a little lost. Can someone point me in the right direction?
Thanks a lot! Hi Alex.
Here's the dput output for a small subset of my data. The data itself is about 80 million rows.
structure(list(date = structure(c(1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885), class = c("POSIXct", "POSIXt"), tzone = ""), type = c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L)), .Names = c("date", "type"), row.names = c(NA, -23L), class = "data.frame")