R - calculating the average value of the dataframe column from the top row to the bottom row

The name may not be so clear, because in a few words it was difficult to summarize the problem, although I do not think that the problem is that it is difficult to solve. To explain the problem, let me provide the data for reference:

head(df, n = 10) team score 1 A 10 2 A 4 3 A 10 4 A 16 5 A 20 6 B 5 7 B 11 8 B 8 9 B 16 10 B 5 

I would like to add a third column, which calculates the average score for each team with the update of the average score, when I go through the rows for each team, and then reboot in a new team. For example, the output column I hope for would look like this:

 head(df, n = 10) team score avg_score 1 A 10 10 2 A 4 7 3 A 10 8 4 A 16 10 5 A 20 12 6 B 5 5 7 B 11 8 8 B 8 8 9 B 16 10 10 B 5 9 # row1: 10 = 10 # row2: 7 = (10 + 4)/2 # row3: 8 = (10 + 4 + 10)/3 # ... 

followed by a template and restarting the calculations for the new team.

Thanks,

+8
r
source share
3 answers
 library("data.table") setDT(df)[, `:=` (avg_score = cumsum(score)/1:.N), by = team] 

or more readable as per @snoram's comment

 setDT(dt)[, avg_score := cumsum(score)/(1:.N), by = team] # team score avg_score # 1: A 10 10 # 2: A 4 7 # 3: A 10 8 # 4: A 16 10 # 5: A 20 12 # 6: B 5 5 # 7: B 11 8 # 8: B 8 8 # 9: B 16 10 # 10: B 5 9 
+7
source share

Here's the R-base solution

 df$avg_score <- unlist(tapply(df$score, df$team, function(x) cumsum(x)/seq_along(x))) > df team score avg_score 1 A 10 10 2 A 4 7 3 A 10 8 4 A 16 10 5 A 20 12 6 B 5 5 7 B 11 8 8 B 8 8 9 B 16 10 10 B 5 9 
+6
source share

We can use cummean from dplyr (also noted that @aosmith commented on this - assuming he is not sending it as a solution)

 library(dplyr) df %>% group_by(team) %>% mutate(avg_score = cummean(score)) # team score avg_score # <chr> <int> <dbl> #1 A 10 10 #2 A 4 7 #3 A 10 8 #4 A 16 10 #5 A 20 12 #6 B 5 5 #7 B 11 8 #8 B 8 8 #9 B 16 10 #10 B 5 9 
0
source share

All Articles