R: ddply repeats annual cumulative data

In connection with this question here , but I decided to ask another question for clarity, since the "new" question is not directly related to the original. In short, I use ddply to summarize the amount for each of the three years. My code takes data from the first year and repeats in the second and third rows of the column. I assume that every 1 year piece is copied to the entire column, but I donโ€™t understand why.

Q. How can I get a cumulatively summed value for each year in the correct rows of a specified column?

[Edit: a for loop - or something similar - is important, because in the end I want to automatically calculate new columns based on a list of column names, rather than manually compute each new column. The loop repeats in the list of column names.]

enter image description here

I often use a combination of ddply and cumsum, so it's pretty annoying to suddenly have problems with it.

[Edit: this code has been updated to the solution I settled on, based on @Chase's answer below]

require(lubridate) require(plyr) require(xts) require(reshape) require(reshape2) set.seed(12345) # create dummy time series data monthsback <- 24 startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback) mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback), myvalue1 = runif(monthsback, min = 600, max = 800), myvalue2 = runif(monthsback, min = 1900, max = 2400), myvalue3 = runif(monthsback, min = 50, max = 80), myvalue4 = runif(monthsback, min = 200, max = 300)) mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y")) mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m")) # Select columns to process newcolnames <- c('myvalue1','myvalue4','myvalue2') # melt n' cast mydf.m <- mydf[,c('mydate','year',newcolnames)] mydf.m <- melt(mydf.m, measure.vars = newcolnames) mydf.m <- ddply(mydf.m, c("year", "variable"), transform, newcol = cumsum(value)) mydf.m <- dcast(mydate ~ variable, data = mydf.m, value.var = "newcol") colnames(mydf.m) <- c('mydate',paste(newcolnames, "_cum", sep = "")) mydf <- merge(mydf, mydf.m, by = 'mydate', all = FALSE) mydf 
+4
source share
1 answer

I really don't follow your loop, but are you feeling too strong? Can't you just use transform and ddply ?

 #Make sure it ordered properly mydf <- mydf[order(mydf$year, mydf$month),] #Use ddply to calculate the cumsum by year: ddply(mydf, "year", transform, cumsum1 = cumsum(myvalue1), cumsum2 = cumsum(myvalue2)) #---------- mydate myvalue1 myvalue2 year month cumsum1 cumsum2 1 2010-05-01 744.1808 264.4543 2010 5 744.1808 264.4543 2 2010-06-01 775.1546 238.9828 2010 6 1519.3354 503.4371 3 2010-07-01 752.1965 269.8544 2010 7 2271.5319 773.2915 .... 9 2011-01-01 745.5411 218.7712 2011 1 745.5411 218.7712 10 2011-02-01 797.9474 268.1834 2011 2 1543.4884 486.9546 11 2011-03-01 606.9071 237.0104 2011 3 2150.3955 723.9650 ... 21 2012-01-01 690.7456 225.9681 2012 1 690.7456 225.9681 22 2012-02-01 665.3505 232.1225 2012 2 1356.0961 458.0906 23 2012-03-01 793.0831 206.0195 2012 3 2149.1792 664.1101 

EDIT is unchecked, as I don't have R on this machine, but this is what I had in mind:

 require(reshape2) mydf.m <- melt(mydf, measure.vars = newcolnames) mydf.m <- ddply(mydf.m, c("year", "variable"), transform, newcol = cumsum(value)) dcast(mydate + year + month ~ variable, data = mydf.m, value.var = "newcol") 
+4
source

Source: https://habr.com/ru/post/1411801/


All Articles