Explanation of the behavior of the operator "=="

In the following very simple example, I cannot understand the behavior of the "==" operator.

A <- c(10, 20, 10, 10, 20, 30) B <- c(40, 50, 60, 70, 80, 90) df <- data.frame(A, B) df[df$A == c(10,20), ] # it returns 3 lines instead of 5 df[df$A %in% c(10,20), ] # it works properly and returns 5 lines 

Thank you all in advance.

+7
r
source share
2 answers

To understand what is going on, you must understand the structure of data frames and disposal rules. A data frame is just a list of vectors.

 > unclass(df) $A [1] 10 20 10 10 20 30 $B [1] 50 60 50 40 70 80 attr(,"row.names") [1] 1 2 3 4 5 6 

If you are comparing two vectors of different lengths in R, the shorter one is recycled . In your case, df$A == c(10,20) equivalent to:

 > c(10, 20, 10, 10, 20, 30) == c(10, 20, 10, 20, 10, 20) [1] TRUE TRUE TRUE FALSE FALSE FALSE 

and

 > df[c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE), ] AB 1 10 50 2 20 60 3 10 50 

From %in% documentation :

%in% returns a logical vector indicating whether there is a match or not for its left operand

 > df$A %in% c(10,20) [1] TRUE TRUE TRUE TRUE TRUE FALSE 

and

 > df[c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE), ] AB 1 10 50 2 20 60 3 10 50 4 10 40 5 20 70 
+10
source share

Here is my solution which I hope will add some ideas to the other (very good) answers. As stated in Norman Matloff's "The Art of Programming R":

When applying an operation to two vectors that require the same length from them, R automatically processes or repeats the shorter one until it is long enough to fit the longer

if the concept is still not clear. Take a look at this and try to guess the conclusion:

 c(10, 10, 10, 10, 10, 10) == c(10, 20) 

which will give:

 [1] TRUE FALSE TRUE FALSE TRUE FALSE 

because it processes the β€œshorter” vector and thus compares the first 10 on the right with the first on the left (and this is TRUE ), but compares the second ten with 20 (the second element of the vector on the right) and that is FALSE ; after that, R processes the shorter vector (which is on the right), and the game starts again.

+3
source share

All Articles