The reason for the unexpected output in the data subset is R

I have a data frame "a" and it has a variable called "VAL". I want to count the elements where the VAL value is 23 or 24.

I used two codes that worked fine:

nrow(subset(a,VAL==23|VAL==24) nrow(subset(a,VAL %in% c(23,24))) 

But I tried another code that gives unexpected output, and I don't know why.

 nrow(subset(a,VAL ==c(23,24))) 

Even if I change the order of 23 and 24, it gives another unexpected conclusion.

 nrow(subset(a,VAL ==c(24,23))) 

Why are these codes incorrect? What are they actually doing?

+2
r subset
source share
1 answer

Working on an example shows where this happens:

 a <- data.frame(VAL=c(1,1,1,23,24)) a # VAL #1 1 #2 1 #3 1 #4 23 #5 24 

These works:

 a$VAL %in% c(23,24) #[1] FALSE FALSE FALSE TRUE TRUE a$VAL==23 | a$VAL==24 #[1] FALSE FALSE FALSE TRUE TRUE 

When comparing, it fails due to vector recirculation - pay attention to the warning message below:

 a$VAL ==c(23,24) #[1] FALSE FALSE FALSE FALSE FALSE #Warning message: #In a$VAL == c(23, 24) : # longer object length is not a multiple of shorter object length 

This last bit of code processes what you are testing and basically compares:

 c( 1, 1, 1, 23, 24) #to c(23, 24, 23, 24, 23) 

... so you will not get any rows. Reordering will give you

 c( 1, 1, 1, 23, 24) #to c(24, 23, 24, 23, 24) 

... and you will get two rows returned (which gives the expected result by pure luck, but this is not practical to use).

+5
source share

All Articles