Is there a more efficient query than the following
DT[, list(length(unique(OrderNo)) ),customerID]
to clarify the LONG format table with customer identifiers, serial number and commodity items, which means that there will be duplicate rows with the same order identifier if the customer has bought more than one item in this transaction.
An attempt to develop unique purchases. length() gives the counter of all order identifiers by customer ID, including duplicates, looking only for a unique number.
Edit here:
Here is some dummy code. Ideally, what I'm looking for is the result of the first query using unique() .
df <- data.frame( customerID=as.factor(c(rep("A",3),rep("B",4))), product=as.factor(c(rep("widget",2),rep("otherstuff",5))), orderID=as.factor(c("xyz","xyz","abd","qwe","rty","yui","poi")), OrderDate=as.Date(c("2013-07-01","2013-07-01","2013-07-03","2013-06-01","2013-06-02","2013-06-03","2013-07-01")) ) DT.eg <- as.data.table(df) #Gives unique order counts DT.eg[, list(orderlength = length(unique(orderID)) ),customerID] #Gives counts of all orders by customer DT.eg[,.SD, keyby=list(orderID, customerID)][, .N, by=customerID] ^ | This should be .N, not .SD ~ RS