How to identify only non-duplicated rows

I have such a situation. Some data. The table is "rbinded".

library(data.table) x <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,8),status=c(FALSE,TRUE,FALSE,TRUE)) y <- data.table(id=c(1,2,3,4),dsp=c(6,6,7,8),status=c(FALSE,FALSE,FALSE,TRUE)) z <- data.table(id=c(1,2,3,4),dsp=c(5,6,9,8),status=c(FALSE,TRUE,FALSE,FALSE)) w <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,NA),status=c(FALSE,TRUE,FALSE,TRUE)) setkey(x,id) setkey(y,id) setkey(z,id) setkey(w,id) Bigdt<-rbind(x,y,z,w) 

I want to get ONLY non-duplicate lines like:

 id dsp status 1 6 FALSE 2 6 FALSE 3 9 FALSE 4 8 FALSE 4 NA TRUE 

So i tried

 Resultdt<-Bigdt[!duplicated(Bigdt)] 

but the result:

 id dsp status 1 5 FALSE 2 6 TRUE 3 7 FALSE 4 8 TRUE 

does not meet my expectations. I tried to use different methods (since rbind is optional), for example, merging, combining, etc., the data.table package seems to be potentially the one that contains the solution ... apparently. Any ideas?

+7
r data.table
source share
2 answers

You can do

 Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL][] id dsp status 1: 1 6 FALSE 2: 2 6 FALSE 3: 3 9 FALSE 4: 4 8 FALSE 5: 4 NA TRUE 

To find out how this works, run only part of the DT[][][][] chain DT[][][][] :

  • Bigdt[, .N, by=names(Bigdt)]
  • Bigdt[, .N, by=names(Bigdt)][N == 1L]
  • Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL]
+9
source share

You can also try

 Bigdt[!(duplicated(Bigdt)|duplicated(Bigdt, fromLast=TRUE))] # id dsp status #1: 1 6 FALSE #2: 2 6 FALSE #3: 3 9 FALSE #4: 4 8 FALSE #5: 4 NA TRUE 

Or if we use .SD

 Bigdt[Bigdt[,!(duplicated(.SD)|duplicated(.SD, fromLast=TRUE))]] 

Or another option would group by column names, find the index of the row with .I and a subset of the dataset

 Bigdt[Bigdt[, .I[.N==1], by = names(Bigdt)]$V1] 
+2
source share

All Articles