R data.table for a variety of conditions.

With the data set below, how can I write a data.table call, subsets this table and returns the entire customer ID and its associated orders for this customer, if this customer ever bought SKU 1?

The expected result should return a table that excludes cid 3 and 5 for this condition and each row for clients matching sku == 1.

I’m stuck because I don’t know how to write a “contains” statement, == literal only returns the sku matching condition ... I am sure there is a better way.

library("data.table")    
df<-data.frame(cid=c(1,1,1,1,1,2,2,2,2,2,3,4,5,5,6,6),
    order=c(1,1,1,2,3,4,4,4,5,5,6,7,8,8,9,9),
    sku=c(1,2,3,2,3,1,2,3,1,3,2,1,2,3,1,2))

    dt=as.data.table(df)
+4
source share
2 answers

This is similar to the previous answer, but here the subset works more than in data.table.

cids, :

matching_cids = dt[sku==1, cid]

%in% , . , :

dt[cid %in% matching_cids]

:

> dt[cid %in% dt[sku==1, cid]]
     cid order sku
  1:   1     1   1
  2:   1     1   2
  3:   1     1   3
  4:   1     2   2
  5:   1     3   3
  6:   2     4   1
  7:   2     4   2
  8:   2     4   3
  9:   2     5   1
 10:   2     5   3
 11:   4     7   1
 12:   6     9   1
 13:   6     9   2
+7

, keys (?!) data.table. , , , , , ( ), , , ( ):

#  Set initial key
setkey(dt,sku)

#  Select only rows with 1 in the sku and return first example of each, setting key to customer id
dts <- dt[ J(1) , .SD[1] , keyby = cid ]

#  change key of dt to cid to match customer id
setkey(dt,cid)

#  join based on common key
dt[dts,.SD]
#    cid order sku
# 1:   1     1   1
# 2:   1     1   2
# 3:   1     2   2
# 4:   1     1   3
# 5:   1     3   3
# 6:   2     4   1
# 7:   2     5   1
# 8:   2     4   2
# 9:   2     4   3
#10:   2     5   3
#11:   4     7   1
#12:   6     9   1
#13:   6     9   2

, , data.table merge, , ...

setkey(dt,sku)
merge( dt[ J(1) , .SD[1] , keyby = cid ] , dt , by = "cid" )
+2

All Articles