I have a data.table with approximately 2.5 million rows. There are two columns. I want to remove any rows that are duplicated in both columns. Previously, for data.frame, I would do the following: df -> unique(df[,c('V1', 'V2')]) , but this does not work with data.table. I tried unique(df[,c(V1,V2), with=FALSE]) , but it seems to still work only with the data.table key, not the whole row.
Any suggestions?
Cheers, Davy
Example
>dt V1 V2 [1,] AB [2,] AC [3,] AD [4,] AB [5,] BA [6,] CD [7,] CD [8,] EF [9,] GG [10,] AB
in the data table above. where V2 is the key of the table, only lines 4.7 and 10 will be deleted.
> dput(dt) structure(list(V1 = c("B", "A", "A", "A", "A", "A", "C", "C", "E", "G"), V2 = c("A", "B", "B", "B", "C", "D", "D", "D", "F", "G")), .Names = c("V1", "V2"), row.names = c(NA, -10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x7fb4c4804578>, sorted = "V2")