Total Forward to dplyr

Question

Total Forward to dplyr

When analyzing data sets from longitudinal studies, I usually get these results from the dplyr analysis dplyr from the source data:

 df = data.frame(n_sessions=c(1,2,3,4,5), n_people=c(59,89,30,23,4))

i.e. the number of participants who passed a certain number of ratings at this point in time.

Although it’s useful to know how many people completed exactly n sessions, we most often need to know how many of them completed at least n sessions. In accordance with the table below, the standard cumulative sum does not fit, we need the values in the n_total column, which is a kind of "total" forward "of the values in the n_people column. That is, the value in each row should be the sum of the values of itself and all the values for its limits, and not the standard total amount, which is the sum of all values up to the very inclusion:

 n_sessions n_people n_total cumsum 1 59 205 59 2 89 146 148 3 30 57 178 4 23 27 201 5 4 4 205

Generating the total is simple:

 mutate(df, cumsum = cumsum(n_people))

What would be the expression for creating a “forward sum” that could be included in the dplyr analysis dplyr ? I suppose that cumsum will need to be applied to n_people after sorting by n_sessions in descending order, but cannot figure out how to get the answer while maintaining the original frame order of the data.

+5

r dplyr

Michael MacAskill Aug 28 '16 at 10:09

source share

1 answer

David robinson · Accepted Answer · 2016-08-28T22:12:15+0000

You can take the cumulative sum of the inverse vector, and then reverse this result. The built-in rev function is useful here:

 mutate(df, rev_cumsum = rev(cumsum(rev(n_people))))

For example, in your data, this returns:

  n_sessions n_people rev_cumsum 1 1 59 205 2 2 89 146 3 3 30 57 4 4 23 27 5 5 4 4

Total Forward to dplyr

More articles: