R data.table by attribute of table i

I want to use the columns of the table i in the data.table join for both calculations and for grouping. There is some limitation in this syntax. Can you suggest a cleaner way to do this?

require(data.table) set.seed(1) 

Table 1

 DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12)) 

table 2

 DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10)) 

A direct join in the tables is fine: [not a problem here, but the requirement to use the names of quoted columns in the on clause seems inconsistent for data.table]

 DT1[DT2, on = "product"] # loc product qty family price # 1: L1 P1 0.1297134 A 5 # 2: L2 P1 0.2423550 A 5 # 3: L1 P1 0.3421633 A 5 # 4: L2 P1 0.6537663 A 5 # 5: L2 P2 0.9822407 A 7 # 6: L1 P2 0.8568853 A 7 # 7: L2 P2 0.7062672 A 7 # 8: L1 P2 0.9224086 A 7 # 9: L1 P3 0.8267184 B 10 #10: L2 P3 0.8408788 B 10 #11: L1 P3 0.6212432 B 10 #12: L2 P3 0.5363538 B 10 

The calculation using the columns of both tables in order:

 DT1[DT2, .(family, product, val = qty*price), on = "product"] # family product val # 1: A P1 0.6485671 # 2: A P1 1.2117750 # 3: A P1 1.7108164 # 4: A P1 3.2688313 # 5: A P2 6.8756851 # 6: A P2 5.9981971 # 7: A P2 4.9438704 # 8: A P2 6.4568599 # 9: B P3 8.2671841 #10: B P3 8.4087878 #11: B P3 6.2124323 #12: B P3 5.3635379 

I can group and group in .EACHI

 DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI] # product family product val #1: P1 A P1 6.83999 #2: P2 A P1 24.27461 #3: P3 B P1 28.25194 

But not using the product

 DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product] #Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)), : #object 'price' not found 

In this case, it ceases to find the price in table i.

.EACHI can be used in this case, because one by one is a unique key for DT2.

However, if I want to group by DT2 attribute, I seem to be unable to use the .EACHI link. I achieved what I want with the following:

 DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family] # family V1 #1: A 31.11460 #2: B 28.25194 

Is this double processing required or is there another piece of syntax that I can use in this situation?

+5
source share

All Articles