I have two data sets of at least 420,500 observations each, for example.
dataset1 <- data.frame(col1=c("microsoft","apple","vmware","delta","microsoft"), col2=paste0(c("a","b","c",4,"asd"),".exe"), col3=rnorm(5)) dataset2 <- data.frame(col1=c("apple","cisco","proactive","dtex","microsoft"), col2=paste0(c("a","b","c",4,"asd"),".exe"), col3=rnorm(5)) > dataset1 col1 col2 col3 1 microsoft a.exe 2 2 apple b.exe 1 3 vmware c.exe 3 4 delta 4.exe 4 5 microsoft asd.exe 5 > dataset2 col1 col2 col3 1 apple a.exe 3 2 cisco b.exe 4 3 vmware d.exe 1 4 delta 5.exe 5 5 microsoft asd.exe 2
I would like to print all the observations in dataset1 so as not to overlap with one in dataset2 (comparing both col1 and col2 in each), that in this case print everything except the last observation - observations 1 and 2 match on col2 , but not col1 , and observations 3 and 4 coincide on col1 , but not col2 , that is:
col1 col2 col3 1: apple b.exe 1 2: delta 4.exe 4 3: microsoft a.exe 2 4: vmware c.exe 3
source share