When analyzing data sets from longitudinal studies, I usually get these results from the dplyr analysis dplyr from the source data:
df = data.frame(n_sessions=c(1,2,3,4,5), n_people=c(59,89,30,23,4))
i.e. the number of participants who passed a certain number of ratings at this point in time.
Although itβs useful to know how many people completed exactly n sessions, we most often need to know how many of them completed at least n sessions. In accordance with the table below, the standard cumulative sum does not fit, we need the values ββin the n_total column, which is a kind of "total" forward "of the values ββin the n_people column. That is, the value in each row should be the sum of the values ββof itself and all the values ββfor its limits, and not the standard total amount, which is the sum of all values ββup to the very inclusion:
n_sessions n_people n_total cumsum 1 59 205 59 2 89 146 148 3 30 57 178 4 23 27 201 5 4 4 205
Generating the total is simple:
mutate(df, cumsum = cumsum(n_people))
What would be the expression for creating a βforward sumβ that could be included in the dplyr analysis dplyr ? I suppose that cumsum will need to be applied to n_people after sorting by n_sessions in descending order, but cannot figure out how to get the answer while maintaining the original frame order of the data.
source share