Total Forward to dplyr

When analyzing data sets from longitudinal studies, I usually get these results from the dplyr analysis dplyr from the source data:

 df = data.frame(n_sessions=c(1,2,3,4,5), n_people=c(59,89,30,23,4)) 

i.e. the number of participants who passed a certain number of ratings at this point in time.

Although it’s useful to know how many people completed exactly n sessions, we most often need to know how many of them completed at least n sessions. In accordance with the table below, the standard cumulative sum does not fit, we need the values ​​in the n_total column, which is a kind of "total" forward "of the values ​​in the n_people column. That is, the value in each row should be the sum of the values ​​of itself and all the values ​​for its limits, and not the standard total amount, which is the sum of all values ​​up to the very inclusion:

 n_sessions n_people n_total cumsum 1 59 205 59 2 89 146 148 3 30 57 178 4 23 27 201 5 4 4 205 

Generating the total is simple:

 mutate(df, cumsum = cumsum(n_people)) 

What would be the expression for creating a β€œforward sum” that could be included in the dplyr analysis dplyr ? I suppose that cumsum will need to be applied to n_people after sorting by n_sessions in descending order, but cannot figure out how to get the answer while maintaining the original frame order of the data.

+5
source share
1 answer

You can take the cumulative sum of the inverse vector, and then reverse this result. The built-in rev function is useful here:

 mutate(df, rev_cumsum = rev(cumsum(rev(n_people)))) 

For example, in your data, this returns:

  n_sessions n_people rev_cumsum 1 1 59 205 2 2 89 146 3 3 30 57 4 4 23 27 5 5 4 4 
+8
source

All Articles