Nonequilibrium pooling, then summing over groups

Here is the MWE.

dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10) dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4)) dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4)) for (ind in 1:4) dtc$mval[ind] <- mean( dta$val [dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind]] ) dtc # id mval # 1: 1 1.0 # 2: 1 1.5 # 3: 2 6.5 # 4: 2 7.0 

dtc should have the same number of lines as dtb. For each (row) ind in dtc,

  • dtc$id[ind] = dtb$id[ind] .
  • dtc$mval[ind] = mean(dta$val[x]) , where x is dta$id == dtb$id[ind] & dta$seq < dtb$fil[ind] .

My data.tables are extremely large. Therefore, I am looking for a way to achieve the above with a minimum amount of memory. I was thinking about joining without equi and then about summing, but I can't get it to work. Hence the title of the question.

Thanks so much for any help, thanks!

+5
source share
1 answer

Maybe it helps

 dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]$V1] dtc # id mval #1: 1 1.0 #2: 1 1.5 #3: 2 6.5 #4: 2 7.0 
+5
source

All Articles