R: How to effectively find out if data.frame A is contained in data.frame B?

Question

R: How to effectively find out if data.frame A is contained in data.frame B?

To find out if a data frame is a df.asubset of the data frame df.b, I did the following:

df.a <- data.frame( x=1:5, y=6:10 )
df.b <- data.frame( x=1:7, y=6:12 )
inds.x <- as.integer( lapply( df.a$x, function(x) which(df.b$x == x) ))
inds.y <- as.integer( lapply( df.a$y, function(y) which(df.b$y == y) ))
identical( inds.x, inds.y )

The last line gave TRUE, therefore, is df.acontained in df.b.

Now I ask myself, is there a more elegant and possibly more efficient way to answer this question?

This task is also easily extended to find the intersection between two data, possibly based only on a subset of columns.

Help would be greatly appreciated.

+3

r dataframe set-intersection subset

user3139868 Mar 30 '15 at 21:18

source share

1 answer

Alex · Accepted Answer · 2015-03-30T23:48:37+0000

I am going to be afraid of the answer to the question.

, semi_join dplyr , , .

?semi_join:

x, y, x.
- , join x y, x.

, , :

df.a <- data.frame( x=c(1:5,1), y=c(6:10,6) )
df.b <- data.frame( x=1:7, y=6:12 )
identical(semi_join(df.b, df.a),  semi_join(df.a, df.a))

FALSE, ,

> semi_join(df.b, df.a)
Joining by: c("x", "y")
  x  y
1 1  6
2 2  7
3 3  8
4 4  9
5 5 10

:

df.c <- data.frame( x=c(1:7, 1), y= c(6:12, 6) )
identical(semi_join(df.c, df.a), semi_join(df.a, df.a))

, TRUE.

semi_join(df.a, df.a) df.a.

R: How to effectively find out if data.frame A is contained in data.frame B?

More articles: