Having df1 and df2 as follows:
df1 <- read.table(text =" xyz 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2",header=TRUE) df2 <- read.table(text =" abc 1 1 1 1 2 8 1 1 2 2 6 2",header=TRUE)
I can request things from the data, such as:
df2[ df2$b == 6 | df2$c == 8 ,]
between data.frame:
df1[ df1$z %in% df2$c ,]
This gives me all the lines:
df1[ (df1$x %in% df2$a) & (df1$y %in% df2$b) & (df1$z %in% df2$c) ,]
but this should not give me all df1 lines too:
df1[ df1$z %in% df2$c | df1$b == 9,]
What I really hope to do is a subset of df1 a df2 in three columns, so that I only get rows in df1, where a, b, c are all equal to x, y, z inside the row at the same time. In real data, I will have more than three columns, but I still want a subset in 3 additive column conditions.
So, a subset of the data from my df1 example to df2 , my result is:
df1 1 1 1 1 1 2
Playing with the syntax is even more confused, and SO posts - all I want is actually leading to more confusion for me.
I realized that I can do this:
merge(df1,df2, by.x=c("x","y","z"),by.y=c("a","b","c"))
which gives me what I want, but I would like to understand why I am mistaken in my attempts [ .