Here is what I managed to achieve using the package " data.table ":
DT <- data.table(dat, key = "key") DT[, list(v1 = sum(rate * v1)/sum(rate), v2 = sum(rate * v2)/sum(rate)), by = "key"]
OK So it's easy to write only two variables, but what about when we have a lot more columns. Use lapply(.SD,...) in combination with your function:
Firstly, some data:
set.seed(1) dat <- data.frame(key = rep(c("a", "b"), times = 10), rate = runif(20, min = 0, max = 1), v1 = sample(10, 20, replace = TRUE), v2 = sample(20, 20, replace = TRUE), v3 = sample(30, 20, replace = TRUE), x1 = sample(5, 20, replace = TRUE), x2 = sample(6:10, 20, replace = TRUE), x3 = sample(11:15, 20, replace = TRUE)) library(data.table) datDT <- data.table(dat, key = "key") datDT
Secondly, the unit:
datDT[, lapply(.SD, function(x, y = rate) sum(y * x)/sum(y)), by = "key"] # key rate v1 v2 v3 x1 x2 x3 # 1: a 0.6501303 6.335976 8.634691 15.75915 3.363832 7.658762 13.19152 # 2: b 0.7375793 3.595585 10.749705 16.26582 2.792390 7.741787 12.57301
If you have a really large dataset, you can generally study data.table .
For what it's worth, I was also successful in the R base, but I'm not sure how effective it would be, especially due to transposition, etc.
t(sapply(split(dat, dat[1]), function(x, y = 3:ncol(dat)) { V1 <- vector() for (i in 1:length(y)) { V1[i] <- sum(x[2] * x[y[i]])/sum(x[2]) } V1 }))