This is what my data frame looks like:
Library (data.table)
df <- fread(' Name EventType Date SalesAmount RunningTotal Runningtotal(prior365Days) John Email 1/1/2014 0 0 0 John Sale 2/1/2014 10 10 10 John Sale 7/1/2014 20 30 30 John Sale 4/1/2015 30 60 50 John Webinar 5/1/2015 0 60 50 Tom Email 1/1/2014 0 0 0 Tom Sale 2/1/2014 15 15 15 Tom Sale 7/1/2014 10 25 25 Tom Sale 4/1/2015 25 50 35 Tom Webinar 5/1/2015 0 50 35 ') df[,Date:= as.Date(Date, format="%m/%d/%Y")]
The last column was my desired column, which is the sum of SalesAmount (for each name) over the last 365 days, and I accomplished this with @ 6pool. His decision was:
df$EventDate <- as.Date(df$EventDate, format="%d/%m/%Y") df <- df %>% group_by (Name) %>% arrange(EventDate) %>% mutate(day = EventDate - EventDate[1]) f <- Vectorize(function(i) sum(df[df$Name[i] == df$Name & df$day[i] - df$day >= 0 & df$day[i] - df$day <= 365, "SalesAmount"]), vec="i") df$RunningTotal365 <- f(1:nrow(df))
However, df $ RunningTotal365 <- f (1: nrow (df)) takes a lot of time (more than 1.5 days), since my data frame is more than 1.5 million rows. I was offered "rollapply" in my original question, but I struggled to figure out how to use it in this case. Please help.
vectorization r zoo dplyr rollapply
gibbz00
source share