Cannot use i.field and by = in the same data.table expression

Question

Cannot use i.field and by = in the same data.table expression

When joining two data tables and using by= in the same expression, I get an error when trying to use a column from the internal data table in j. I can break things down into two separate expressions, but this is an extra typing - and possibly a performance hit when using large datasets

As an example

 require(data.table) DT1 <- data.table(k1 = 1:2, k2 = c('a', 'a', 'a', 'b', 'b', 'c'), v1 = 1:6, key = 'k2') DT2 <- data.table(k1 = c('a', 'b', 'c'), w1 = 3^(1:3), key = 'k1') DT1[DT2, sum(v1*w1), by=k1] # fails complaining about being unable to find w1 DT1[DT2, sum(v1*i.w1), by=k1] # also fails with the same error DT1[DT2][, sum(v1*w1), by=k1] # works

With small datasets, the connection and then the group approach is great. However, for data sets with many columns, creating an intermediate result with all the data.tables data columns is a significant burden (my actual data tables are about 1-2 GB in size).

While I could reduce the number of columns involved in the work

 DT1[DT2[,.(k1, w1)]][,sum(v1*w1),by=k1]

which eliminates one of the large values of data.tables - you do not need to constantly indicate the relationship between data sets. It also requires me to remember a particular column in two different places every time I make a join.

Is there something obvious that I'm missing?

+6

r data.table

Stephen zander Feb 04 '16 at 6:03

source share

1 answer

Bram visser · Answer 1 · 2016-08-31T09:50:19+0000

A possible recurring question with reference to a request to the data.table # 733 function.

Cannot use i.field and by = in the same data.table expression

More articles: