I am trying to calculate the total amount for a given window based on a condition. I saw threads where the solution makes a conditional cumulative sum ( Calculate the conditional current sum in R for each row in the data frame ) and the current sum ( Rolling Sum of another variable in R ), but I could not find them together. I also saw that data.table does not have a rolling function in the R window of data.table. . So this problem is very complicated for me.
Also, the decision posted by Mike Grahan on current amounts is beyond my comprehension. I am looking for the data.table method mainly for speed. However, I am open to other methods, if understood.
Here is my input:
DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011, 2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4, 3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY", "Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")
Here is my expected result: (Manually created; I apologize if there is a manual error)
DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2), cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4, 2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum" ), row.names = c(NA, 16L), class = "data.frame")
Some comments on the logic:
1) I want to find a running amount for a 5 year period. Ideally, I would like this 5-year period to be variable, that is, something that I can specify elsewhere in the code. Thus, I have the right to change the window later for my analysis.
2) The end of the window is based on the maximum year (i.e. FY in the example above). In the above example, the maximum FY in DFI is 2016 . So, the initial year of the window will be 2016 - 5 + 1 = 2012 for all entries in 2016 .
3) The window amount (or current amount) is calculated using Customer and for a specific Product .
What I tried:
I wanted to try something before posting. Here is my code:
DFI <- data.table::as.data.table(DFI) #Sort it first DFI<-DFI[order(Customer,FY),] #find cumulative sum; remove Rev column; order rows DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]
This code calculates the total amount, but I can not determine the 5-year window, and then calculate the current amount. I have two questions:
Question 1) How to calculate the 5-year current amount?
Question 2) Can someone explain Mike's method in this thread ? It seems to be fast. However, I'm not quite sure what is going on there. I saw someone request some comments, but I'm not sure if this is self-evident.
Thanks in advance. I struggled with this problem for two days.