Group average, delete current row

I want to calculate the group means of a variable, but excluding the focal responder:

set.seed(1)
dat <- data.table(id = 1:30, y = runif(30), grp = rep(1:3, each=10))

The first record (responder) should have an average value ... second ... and so on:

mean(dat[c==1, y][-1])
mean(dat[c==1, y][-2])
mean(dat[c==1, y][-3])

For the second group the same:

mean(dat[c==2, y][-1])
mean(dat[c==2, y][-2])
mean(dat[c==2, y][-3])

I tried this, but this did not work:

ex[, avg := mean(ex[, y][-.I]), by=grp]

Any ideas?

+4
source share
3 answers

It looks like you are most there, and you just need to consider NA:

dat[, avg := (sum(y, na.rm=T) - ifelse(is.na(y), 0, y)) / (sum(!is.na(y)) + is.na(y) - 1)
    , by = grp]

No double loops or additional memory is required.

+2
source

You can try this solution:

set.seed(1)
dat <- data.table(id = 1:9, y = c(NA,runif(8)), grp = rep(1:3, each=3))

dat[, avg2 := sapply(seq_along(y),function(i) mean(y[-i],na.rm=T)), by=grp]

dat
#    id         y grp      avg2
# 1:  1        NA   1 0.3188163
# 2:  2 0.2655087   1 0.3721239
# 3:  3 0.3721239   1 0.2655087
# 4:  4 0.5728534   2 0.5549449
# 5:  5 0.9082078   2 0.3872676
# 6:  6 0.2016819   2 0.7405306
# 7:  7 0.8983897   3 0.8027365
# 8:  8 0.9446753   3 0.7795937
# 9:  9 0.6607978   3 0.9215325
+4
source

, , :

dat[,
  .(id, y2=rep(y, .N), id2=rep(id, .N), id3=rep(id, each=.N)), by=grp      
][
  !(id2 == id3),
  mean(y2), 
  by=.(id3, grp)
]

The first step is to duplicate all the group data for each identifier and note which row we want to exclude from the middle. The second step is to exclude the lines and then group them by / id groups. Obviously, this is not super-efficient memory, but it should work as long as you are not limited by memory.

+1
source

All Articles