How to find which elements of one set are in another set?

I have two sets: A with columns x, y and B also with columns x, y. I need to find the index of the rows A that are inside B (both x and y must match). I came up with a simple solution (see below), but this comparison is inside the loop, and paste adds a lot more extra time.

 B <- data.frame(x = sample(1:1000, 1000), y = sample(1:1000, 1000)) A <- B[sample(1:1000, 10),] #change some elements A$x[c(1,3,7,10)] <- A$x[c(1,3,7,10)] + 0.5 A$xy <- paste(A$x, A$y, sep='ZZZ') B$xy <- paste(B$x, B$y, sep='ZZZ') indx <- which(A$xy %in% B$xy) indx 

For example, for one observation, the paste alternative is almost 3 times faster

  ind <- sample(1:1000, 1) xx <- B$x[ind] yy <- B$y[ind] ind <- which(with(B, x==xx & y==yy)) # [1] 0.0160000324249268 seconds xy <- paste(xx,'ZZZ',yy, sep='') ind <- which(B$xy == xy) # [1] 0.0469999313354492 seconds 
+4
source share
1 answer

How to use merge() to match for you?

 A$id <- seq_len(nrow(A)) sort(merge(A, B)$id) # [1] 2 4 5 6 8 9 

Edit:

Or, to get rid of two unnecessary sorts, use the sort= parameter for merge()

 merge(A, B, sort=FALSE)$id # [1] 2 4 5 6 8 9 
+2
source

All Articles