Daily Content Aggregation

I am trying to combine (some kind of erratic) daily data. I actually work with csv data, but if I recreated it, it would look something like this:

library(zoo) dates <- c("20100505", "20100505", "20100506", "20100507") val1 <- c("10", "11", "1", "6") val2 <- c("5", "31", "2", "7") x <- data.frame(dates = dates, val1=val1, val2=val2) z <- read.zoo(x, format = "%Y%m%d") 

Now I would like to summarize this on a daily basis (note that in some cases there is 1 date-point during the day, and sometimes arent.

I tried many, many options, but I can’t compose, so for example this fails:

 aggregate(z, as.Date(time(z)), sum) # Error in Summary.factor(2:3, na.rm = FALSE) : sum not meaningful for factors 

There seems to be a lot of content regarding the aggregate, and I tried several versions, but it doesn't seem to summarize this on a daily level. I would also like to run cummax and cumulative averages in addition to daily totals.

Any help would be appreciated.

Update

The code I use is as follows:

 z <- read.zoo(file = "data.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, blank.lines.skip = T, na.strings="NA", format = "%Y%m%d"); 

It seems that my (inadvertent) quote from the above numbers is similar to what happens in practice, because when I do this:

 aggregate(z, index(z), sum) #Error in Summary.factor(25L, na.rm = FALSE) : sum not meaningful for factors 

There are a few columns (100 or so), how can I indicate that they will be like .numeric automatically? ( stringAsFactors = False doesn't seem to work?)

+3
source share
4 answers

Or do you aggregate before using the zoo (val1 and val2 should be numeric, though).

 x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2)) y <- aggregate(x[,2:3],by=list(x[,1]),FUN=sum) 

and then feed y to the zoo.

You are avoiding the warning :)

+5
source

You started on the right track, but made a couple of mistakes.

First, the zoo only consumes matrices, not data.frames. Secondly, they need numerical inputs:

 > z <- zoo(as.matrix(data.frame(val1=c(10,11,1,6), val2=c(5,31,2,7))), + order.by=as.Date(c("20100505","20100505","20100506","20100507"), + "%Y%m%d")) Warning message: In zoo(as.matrix(data.frame(val1 = c(10, 11, 1, 6), val2 = c(5, : some methods for "zoo" objects do not work if the index entries in 'order.by' are not unique 

This gives us a warning, which is standard in the zoo: he does not like identical time indices.

It is always a good idea to show the data structure, possibly through str() , maybe run summary() on it:

 > z val1 val2 2010-05-05 10 5 2010-05-05 11 31 2010-05-06 1 2 2010-05-07 6 7 

And then, as soon as we succeed, aggregation will be easy:

 > aggregate(z, index(z), sum) val1 val2 2010-05-05 21 36 2010-05-06 1 2 2010-05-07 6 7 > 
+4
source

val1 and val2 are character strings. data.frame() converts them to factors. Summing factors do not make sense. You probably planned:

 x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2)) z <- read.zoo(x, format = "%Y%m%d") aggregate(z, as.Date(time(z)), sum) 

which gives:

  val1 val2 2010-05-05 21 36 2010-05-06 1 2 2010-05-07 6 7 
+1
source

Convert character columns to numeric, then use read.zoo using its aggregate argument:

 > x[-1] <- lapply(x[-1], function(x) as.numeric(as.character(x))) > read.zoo(x, format = "%Y%m%d", aggregate = sum) val1 val2 2010-05-05 21 36 2010-05-06 1 2 2010-05-07 6 7 
+1
source

All Articles