Disjoint values ​​in R

I have two data sets of at least 420,500 observations each, for example.

dataset1 <- data.frame(col1=c("microsoft","apple","vmware","delta","microsoft"), col2=paste0(c("a","b","c",4,"asd"),".exe"), col3=rnorm(5)) dataset2 <- data.frame(col1=c("apple","cisco","proactive","dtex","microsoft"), col2=paste0(c("a","b","c",4,"asd"),".exe"), col3=rnorm(5)) > dataset1 col1 col2 col3 1 microsoft a.exe 2 2 apple b.exe 1 3 vmware c.exe 3 4 delta 4.exe 4 5 microsoft asd.exe 5 > dataset2 col1 col2 col3 1 apple a.exe 3 2 cisco b.exe 4 3 vmware d.exe 1 4 delta 5.exe 5 5 microsoft asd.exe 2 

I would like to print all the observations in dataset1 so as not to overlap with one in dataset2 (comparing both col1 and col2 in each), that in this case print everything except the last observation - observations 1 and 2 match on col2 , but not col1 , and observations 3 and 4 coincide on col1 , but not col2 , that is:

  col1 col2 col3 1: apple b.exe 1 2: delta 4.exe 4 3: microsoft a.exe 2 4: vmware c.exe 3 
+5
source share
2 answers

You can use anti_join from dplyr

  library(dplyr) anti_join(df1, df2, by = c('col1', 'col2')) # col1 col2 col3 #1 delta 4.exe -0.5836272 #2 vmware c.exe 0.4196231 #3 apple b.exe 0.5365853 #4 microsoft a.exe -0.5458808 

data

  set.seed(24) df1 <- data.frame(col1 = c('microsoft', 'apple', 'vmware', 'delta', 'microsoft'), col2= c('a.exe', 'b.exe', 'c.exe', '4.exe', 'asd.exe'), col3=rnorm(5), stringsAsFactors=FALSE) set.seed(22) df2 <- data.frame(col1 = c( 'apple', 'cisco', 'proactive', 'dtex', 'microsoft'), col2= c('a.exe', 'b.exe', 'c.exe', '4.exe', 'asd.exe'), col3=rnorm(5), stringsAsFactors=FALSE) 
+5
source

data.table solution inspired by this :

 library(data.table) #1.9.5+ setDT(dataset1,key=c("col1","col2")) setDT(dataset2,key=key(dataset1)) dataset1[!dataset2] col1 col2 col3 1: apple b.exe 1 2: delta 4.exe 4 3: microsoft a.exe 2 4: vmware c.exe 3 

You can also try without a key:

 library(data.table) #1.9.5+ setDT(dataset1); setDT(dataset2) dataset1[!dataset2,on=c("col1","col2")] 
+4
source

All Articles