I would like to update a subset of rows in a data.table based on a join and a fixed value.
d1 <- data.table(A = 5:1, B = letters[5:1])
d2 <- data.table(C = letters[5:1], Z = 6:10)
current.val <- 5
what I want to do is update d1 based on the connection to d2, but only where A == 5 in d1. something like that:
d1[d2, D := i.Z ,on=.(B==C, A==current.val)]
my current approach is to add a new column in d2 and set it to a fixed value and use it in the join:
d2[, current.val := 5]
d1[d2, D := i.Z ,on=.(B==C, A==current.val)]
This works, but there seems to be a lot of overhead. Is there an easier way to use a constant value in a join?
(8/14) New example of a scale for comparison:
d1 <- data.table(A = 100:1, B = 100000000:1, D = as.numeric(NA), key = c("A", "B"))
d2 <- data.table(C = 100000000:1, Z = c(10:1) / 10, key = "C")
current.val <- 5
system.time(d1[cbind(d2, A = current.val), on = .(B = C, A), D := i.Z])
system.time({setkey(d1, B, A); d1[d1[d2][A == current.val], D := Z]; setkey(d1, A, B)})
system.time(d1[d1[d2][A == current.val], D := Z]) # fastest, if inverse key order is acceptable