I need to calculate and add several new columns to the data frame based on the values ββin each column in the subset of columns in the data frame. These columns store time series data (there is a common date column). For example, I need to calculate the change for the same month in the previous year for a dozen columns. I could specify them and calculate them individually, but it becomes cumbersome with lots of columns to convert, so I am trying to automate the process with a for loop.
I worked fine until I tried using ddply
to create a column for the current sum of the value for the year. It happens that ddply
adds new lines during each iteration through the loop and includes these new lines in the cumsum
calculation. I have two questions.
Q. How can I get ddply to calculate the correct cumsum? Q. How can I specify a column name during a ddply call and not use a dummy value and rename it later?
[Edit: I spoke too early, the updated code below does not work at the moment, just FYI]
require(lubridate) require(plyr) require(xts) set.seed(12345) # create dummy time series data monthsback <- 24 startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback) mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback), myvalue1 = runif(monthsback, min = 600, max = 800), myvalue2 = runif(monthsback, min = 200, max = 300)) mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y")) mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m")) newcolnames <- c('myvalue1','myvalue2') for (i in seq_along(newcolnames)) { print(newcolnames[i]) mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate) ## Calculate change over same month in previous year mylag <- 12 mydf[, paste(newcolnames[i], "_yoy", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag)) ## Calculate change over previous month mylag <- 1 mydf[, paste(newcolnames[i], "_mom", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag)) ## Calculate cumulative figure #mydf$newcol <- as.numeric(mydf$myxts) mydf$newcol <- 1 mydf <- ddply(mydf, .(year), transform, newcol = cumsum(as.numeric(mydf$myxts))) colnames(mydf)[colnames(mydf)=="newcol"] <- paste(newcolnames[i], "_cuml", sep = "", collapse = "") } mydf