I am new to R and data.table , which I find useful and fast. I am trying to join two data tables:
> TotFreq Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType 1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM 2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM 3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM 4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM 5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM 160167: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM 160168: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM 160169: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM 160170: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM 160171: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM > Count_SM_FM Legacy_Store_Number WeekDay ItemType ObjItems 1: 113802 1 SM 12305 2: 113802 1 FM 1942 3: 113802 2 SM 11014 4: 113802 2 FM 1398 5: 113802 3 SM 10154 6: 113802 3 FM 1117 7: 113802 4 SM 10414 8: 113802 4 FM 1167 9: 113802 5 SM 10258 10: 113802 5 FM 1200 11: 113802 6 SM 11116 12: 113802 6 FM 1575 13: 113802 7 SM 13098 14: 113802 7 FM 2326 > setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType) > > ResultJoin <- TotFreq[Count_SM_FM] Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 320342 rows; more than 160171 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Qaru and datatable-help for advice.
But I do not have duplicate keys in i !
Using:
> ResultJoin <- TotFreq[Count_SM_FM,allow.cartesian=T] > > ResultJoin Legacy_Store_Number WeekDay Date Item_Key Distr NoSellingDays meanUnits ItemType ItemType.1 ObjItems 1: 113802 1 2013-03-24 000000000120 2.428985e-04 0 8.00 FM SM 12305 2: 113802 1 2013-03-24 000000000126 1.104030e-03 0 47.50 FM SM 12305 3: 113802 1 2013-03-24 000000000170 1.126004e-03 0 48.75 FM SM 12305 4: 113802 1 2013-03-24 000000000180 5.143034e-04 0 19.00 FM SM 12305 5: 113802 1 2013-03-24 000000000260 3.854306e-04 0 12.25 FM SM 12305 --- 320338: 113802 7 2013-03-23 978125002327 5.902655e-07 27 1.00 SM FM 2326 320339: 113802 7 2013-03-23 978141970584 1.770796e-06 25 1.00 SM FM 2326 320340: 113802 7 2013-03-23 978145300697 1.180531e-06 26 1.00 SM FM 2326 320341: 113802 7 2013-03-23 978145552558 5.902655e-07 27 1.00 SM FM 2326 320342: 113802 7 2013-03-23 978160139536 5.902655e-07 27 1.00 SM FM 2326
I actually get duplicate entries that I had in my original TotFreq table. If I add the key also on Count_SM_FM , the connection will work:
> setkey(TotFreq,Legacy_Store_Number,WeekDay,ItemType) > setkey(Count_SM_FM,Legacy_Store_Number,WeekDay,ItemType) > ResultJoin <- TotFreq[Count_SM_FM] > > ResultJoin Legacy_Store_Number WeekDay ItemType Date Item_Key Distr NoSellingDays meanUnits ObjItems 1: 113802 1 FM 2013-03-24 000000000120 2.428985e-04 0 8.00 1942 2: 113802 1 FM 2013-03-24 000000000126 1.104030e-03 0 47.50 1942 3: 113802 1 FM 2013-03-24 000000000170 1.126004e-03 0 48.75 1942 4: 113802 1 FM 2013-03-24 000000000180 5.143034e-04 0 19.00 1942 5: 113802 1 FM 2013-03-24 000000000260 3.854306e-04 0 12.25 1942 --- 160167: 113802 7 SM 2013-03-23 978125002327 5.902655e-07 27 1.00 13098 160168: 113802 7 SM 2013-03-23 978141970584 1.770796e-06 25 1.00 13098 160169: 113802 7 SM 2013-03-23 978145300697 1.180531e-06 26 1.00 13098 160170: 113802 7 SM 2013-03-23 978145552558 5.902655e-07 27 1.00 13098 160171: 113802 7 SM 2013-03-23 978160139536 5.902655e-07 27 1.00 13098
I tried to check the example, maybe the problem was the lack of key variables as the first columns in TotFreq or in the absence of Count_SM_FM not sorted, but I could not reproduce the error
> daysType <- data.table( + key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1), + key2=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7), + key3=c("b","a","a","b","a","b","a","b","a","b","a","b","a","b"), + var1=c(2,4,6,8,4,5,7,3,7,9,6,3,5,6) + ) > > > detailData <- data.table( + key1=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), + key2=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,7,7), + var2=c(10,11,12,13,15,16,17,10,11,12,13,14,15,16,10,11,12,15,16,17,10,11,12,13,14,15,16,17,10,11,13,14,15,16,17,10,11,12,13,14,15,10,11,12,13,14,15,16,17), + var3=c(1,2,4,6,6,7,3,6,8,9,3,5,7,8,6,7,8,6,7,2,4,6,7,8,2,3,5,7,4,7,8,3,6,4,2,5,7,3,6,7,3,4,2,4,6,4,7,2,9), + key3=c("a","a","a","a","b","b","b","a","a","a","a","b","b","b","a","a","a","b","b","b","a","a","a","a","b","b","b","b","a","a","a","b","b","b","b","a","a","a","a","b","b","a","a","a","a","b","b","b","b") + ) > > setkey(detailData,key1,key2,key3) > JoinResult <- detailData[daysType]
The problem is different from the set
Error connecting two .tables data
because there allow.cartesian solves the problem.
What is the problem? Why Count_SM_FM adding a key to Count_SM_FM solve it?
Thanks!