Background
I am new to the library data.tableand am currently participating in its effective use. What I have are two tables, and first I want to aggregate the second, and then combine it with the first and change the column in the joined table. Ideal (and for my understanding) at a time.
Package version
sessionInfo()
the code
What I tried can be seen in this minimal example:
library(data.table)
set.seed(1)
DT1 <- data.table(id = LETTERS[1:4], x = rnorm(4), key = "id")
DT2 <- data.table(id = rep(LETTERS[1:4], each = 3), y = 1:12, z = rep(1, 12), key = "id")
DT1[DT2[, lapply(.SD, mean), by = "id"]]
DT1[DT2[, lapply(.SD, mean), by = "id"], x := -x]
DT1
I suppose this has something to do with the smart way how it data.tableprocesses data (and does not make copies if necessary, so it calls by reference). Therefore, the following code works:
DT3 <- copy(DT1[DT2[, lapply(.SD, mean), by = "id"]])[, x := -x]
(DT4 <- DT1[DT2[, lapply(.SD, mean), by = "id"]][, x := -x])
identical(DT3, DT4)
Questions
- What is the best way to do this? "Best" in terms of time and memory?
- ? , , , ( ) ?
(1) , (2) , ?