Cannot use i.field and by = in the same data.table expression

When joining two data tables and using by= in the same expression, I get an error when trying to use a column from the internal data table in j. I can break things down into two separate expressions, but this is an extra typing - and possibly a performance hit when using large datasets

As an example

 require(data.table) DT1 <- data.table(k1 = 1:2, k2 = c('a', 'a', 'a', 'b', 'b', 'c'), v1 = 1:6, key = 'k2') DT2 <- data.table(k1 = c('a', 'b', 'c'), w1 = 3^(1:3), key = 'k1') DT1[DT2, sum(v1*w1), by=k1] # fails complaining about being unable to find w1 DT1[DT2, sum(v1*i.w1), by=k1] # also fails with the same error DT1[DT2][, sum(v1*w1), by=k1] # works 

With small datasets, the connection and then the group approach is great. However, for data sets with many columns, creating an intermediate result with all the data.tables data columns is a significant burden (my actual data tables are about 1-2 GB in size).

While I could reduce the number of columns involved in the work

 DT1[DT2[,.(k1, w1)]][,sum(v1*w1),by=k1] 

which eliminates one of the large values โ€‹โ€‹of data.tables - you do not need to constantly indicate the relationship between data sets. It also requires me to remember a particular column in two different places every time I make a join.

Is there something obvious that I'm missing?

+6
source share
1 answer

A possible recurring question with reference to a request to the data.table # 733 function.

0
source

All Articles