The data.table package also has its own unique and duplicated methods with some additional features.
Both unique.data.table and duplicated.data.table methods have an additional by argument, which allows you to pass a character or integer vector of column names or their locations, respectively
library(data.table) DT <- data.table(id = c(1,1,1,2,2,2), val = c(10,20,30,10,20,30)) unique(DT, by = "id") # id val # 1: 1 10 # 2: 2 10 duplicated(DT, by = "id") # [1] FALSE TRUE TRUE FALSE TRUE TRUE
Another important feature of these methods is the huge performance boost for large datasets.
library(microbenchmark) library(data.table) set.seed(123) DF <- as.data.frame(matrix(sample(1e8, 1e5, replace = TRUE), ncol = 10)) DT <- copy(DF) setDT(DT) microbenchmark(unique(DF), unique(DT))
David Arenburg Mar 17 '16 at 11:01 2016-03-17 11:01
source share