R: cummean () over a subset

I am new to R as well as stackoverflow, so pls carry me if I do something wrong here ...

So, I'm working on soccer data that looks like this:

  Div     Date      HomeTeam   AwayTeam FTHG FTAG avgHG_league avgHG_team
1  D1 14/08/15 Bayern Munich    Hamburg    5    0           NA          0
2  D1 15/08/15      Augsburg     Hertha    0    1     5.000000          0
3  D1 15/08/15     Darmstadt   Hannover    2    2     2.500000          0
4  D1 15/08/15      Dortmund M'gladbach    4    0     2.333333          0
5  D1 15/08/15    Leverkusen Hoffenheim    2    1     2.750000          0
6  D1 15/08/15         Mainz Ingolstadt    0    1     2.600000          0

I created the avgHG_league column to give me the average goals that the teams have scored so far this season, with the following code:

BLfiltered <- BLfiltered %>%
  mutate(avgHG_league = lag(cummean(FTHG),1))

Now in the avgHG_team column I want to do almost the same thing, but instead of calculating the average goals of all home teams together, I only want to calculate the average of goals that only apply to this particular hometeam scored at home so far in the season (but not including this game) ...

Do you have any ideas?

Thank!

/ E: the β€œFTHG” column gives us home goals from each match

+4
2

. dplyr, , , , cummean . sd .

sd = mutate(sd,avgHG_league=lag(cummean(FTHG),1,0)) %.% group_by(HomeTeam) %.% mutate(avgHG_Team=lag(cummean(FTHG),1,0)) %.% ungroup()

: 0 ( default=0) 0 NA , , , , .

    Div     Date      HomeTeam   AwayTeam FTHG FTAG
1   D1 14/08/15 Bayern Munich    Hamburg    5    0
2   D1 15/08/15      Augsburg     Hertha    0    1
3   D1 15/08/15     Darmstadt   Hannover    2    2
4   D1 15/08/15      Dortmund M'gladbach    4    0
5   D1 15/08/15    Leverkusen Hoffenheim    2    1
6   D1 15/08/15         Mainz Ingolstadt    0    1
7   D1 15/09/15 Bayern Munich    Hamburg    0    0
8   D1 15/10/15      Augsburg     Hertha    0    0
9   D1 15/10/15     Darmstadt   Hannover    0    0
10  D1 15/10/15      Dortmund M'gladbach    0    0
11  D1 15/10/15    Leverkusen Hoffenheim    0    0
12  D1 15/10/15         Mainz Ingolstadt    0    0
13  D1 15/11/15 Bayern Munich    Hamburg    0    0
14  D1 15/10/16      Augsburg     Hertha    0    0
15  D1 15/11/16     Darmstadt   Hannover    0    0
16  D1 15/10/17      Dortmund M'gladbach    0    0
17  D1 15/11/17    Leverkusen Hoffenheim    0    0
18  D1 15/10/18         Mainz Ingolstadt    0    0

    Div   Date        HomeTeam   AwayTeam FTHG FTAG avgHG_league avgHG_Team
1   D1 14/08/15 Bayern Munich    Hamburg    5    0    0.0000000        0.0
2   D1 15/08/15      Augsburg     Hertha    0    1    5.0000000        0.0
3   D1 15/08/15     Darmstadt   Hannover    2    2    2.5000000        0.0
4   D1 15/08/15      Dortmund M'gladbach    4    0    2.3333333        0.0
5   D1 15/08/15    Leverkusen Hoffenheim    2    1    2.7500000        0.0
6   D1 15/08/15         Mainz Ingolstadt    0    1    2.6000000        0.0
7   D1 15/09/15 Bayern Munich    Hamburg    0    0    2.1666667        5.0
8   D1 15/10/15      Augsburg     Hertha    0    0    1.8571429        0.0
9   D1 15/10/15     Darmstadt   Hannover    0    0    1.6250000        2.0
10  D1 15/10/15      Dortmund M'gladbach    0    0    1.4444444        4.0
11  D1 15/10/15    Leverkusen Hoffenheim    0    0    1.3000000        2.0
12  D1 15/10/15         Mainz Ingolstadt    0    0    1.1818182        0.0
13  D1 15/11/15 Bayern Munich    Hamburg    0    0    1.0833333        2.5
14  D1 15/10/16      Augsburg     Hertha    0    0    1.0000000        0.0
15  D1 15/11/16     Darmstadt   Hannover    0    0    0.9285714        1.0
16  D1 15/10/17      Dortmund M'gladbach    0    0    0.8666667        2.0
17  D1 15/11/17    Leverkusen Hoffenheim    0    0    0.8125000        1.0
18  D1 15/10/18         Mainz Ingolstadt    0    0    0.7647059        0.0

: https://blog.rstudio.org/2014/01/17/introducing-dplyr/

R tidyr dplyr: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

+1

, R , HomeTeam, Date:

my.data <- read.csv(text = '
  Div,     Date,      HomeTeam,   AwayTeam, FTHG, FTAG
   D1, 14/08/15, Bayern Munich,    Hamburg,    5,    0
   D1, 15/08/15, Bayern Munich,     Hertha,    0,    1
   D1, 16/08/15,     Darmstadt,   Hannover,    2,    2
   D1, 17/08/15,     Darmstadt, Ingolstadt,    4,    0
   D1, 18/08/15,     Darmstadt, Hoffenheim,    2,    1
   D1, 19/08/15,         Mainz, Ingolstadt,    0,    1
', header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE)

my.data <- my.data[with(my.data, order(HomeTeam, Date)), ]
my.data

my.means <- aggregate(my.data$FTHG, by=list(my.data$HomeTeam), 
            FUN = {function(x) cumsum(x)/seq(from=1, to=length(x)) })

my.data$my.cum.means <- c(unlist(my.means[2]))
my.data

#
#     Div     Date      HomeTeam   AwayTeam FTHG FTAG my.cum.means
#x.11  D1 14/08/15 Bayern Munich    Hamburg    5    0     5.000000
#x.12  D1 15/08/15 Bayern Munich     Hertha    0    1     2.500000
#x.21  D1 16/08/15     Darmstadt   Hannover    2    2     2.000000
#x.22  D1 17/08/15     Darmstadt Ingolstadt    4    0     3.000000
#x.23  D1 18/08/15     Darmstadt Hoffenheim    2    1     2.666667
#x.3   D1 19/08/15         Mainz Ingolstadt    0    1     0.000000
#
0

All Articles