I have a dataframe with three columns: Id , Date and Value and I want to reduce it by the average value: take the next 20 rows, build the average Value from these 20 rows and add it to a new data block with the same structure. Date should be the first value of 20 lines.
I tried it like this (maybe awful :):
resample.downsample <- function(data, by=20) { i <- 0 nmax <- nrow(data) means <- c() while(i < nmax) { means <- c(means, mean(subset(data, Id > i & Id <= i+by)$Value)) i <- i+by } return ( data.frame( Id = seq(1, length.out=(nmax/by), by=1), Date = seq(startDate, length.out=(nmax/by), by=(1/by)), Value = means ) ) }
This works for small data sets, but works forever on my real data sets (~ 4,000,000 rows). Any ideas on optimizing this feature?
Sample-Data (input, output should have the same structure, classes: integer, numeric, POSIXct / POSIXt):
Value Id Date 1 125 1 2011-06-30 22:41:50 2 127 2 2011-06-30 22:41:50 3 126 3 2011-06-30 22:41:50 4 123 4 2011-06-30 22:41:50 5 130 5 2011-06-30 22:41:50 6 131 6 2011-06-30 22:41:50 7 128 7 2011-06-30 22:41:50
source share