R: using ddply in a loop over columns of a data frame

I need to calculate and add several new columns to the data frame based on the values ​​in each column in the subset of columns in the data frame. These columns store time series data (there is a common date column). For example, I need to calculate the change for the same month in the previous year for a dozen columns. I could specify them and calculate them individually, but it becomes cumbersome with lots of columns to convert, so I am trying to automate the process with a for loop.

I worked fine until I tried using ddply to create a column for the current sum of the value for the year. It happens that ddply adds new lines during each iteration through the loop and includes these new lines in the cumsum calculation. I have two questions.

Q. How can I get ddply to calculate the correct cumsum? Q. How can I specify a column name during a ddply call and not use a dummy value and rename it later?

[Edit: I spoke too early, the updated code below does not work at the moment, just FYI]

 require(lubridate) require(plyr) require(xts) set.seed(12345) # create dummy time series data monthsback <- 24 startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback) mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback), myvalue1 = runif(monthsback, min = 600, max = 800), myvalue2 = runif(monthsback, min = 200, max = 300)) mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y")) mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m")) newcolnames <- c('myvalue1','myvalue2') for (i in seq_along(newcolnames)) { print(newcolnames[i]) mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate) ## Calculate change over same month in previous year mylag <- 12 mydf[, paste(newcolnames[i], "_yoy", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag)) ## Calculate change over previous month mylag <- 1 mydf[, paste(newcolnames[i], "_mom", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag)) ## Calculate cumulative figure #mydf$newcol <- as.numeric(mydf$myxts) mydf$newcol <- 1 mydf <- ddply(mydf, .(year), transform, newcol = cumsum(as.numeric(mydf$myxts))) colnames(mydf)[colnames(mydf)=="newcol"] <- paste(newcolnames[i], "_cuml", sep = "", collapse = "") } mydf 
+1
source share
1 answer

In your loop, since myxts not part of the data frame, it is not shared in the ddply statement with everything else. Change it to:

 mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate) 

I do not know how to use dynamically generated names with transform .

0
source

Source: https://habr.com/ru/post/1411803/


All Articles