The reason for the unexpected output in the data subset is R

Question

The reason for the unexpected output in the data subset is R

I have a data frame "a" and it has a variable called "VAL". I want to count the elements where the VAL value is 23 or 24.

I used two codes that worked fine:

nrow(subset(a,VAL==23|VAL==24) nrow(subset(a,VAL %in% c(23,24)))

But I tried another code that gives unexpected output, and I don't know why.

 nrow(subset(a,VAL ==c(23,24)))

Even if I change the order of 23 and 24, it gives another unexpected conclusion.

 nrow(subset(a,VAL ==c(24,23)))

Why are these codes incorrect? What are they actually doing?

+2

r subset

Creamstat Apr 18 '14 at 0:47

source share

1 answer

thelatemail · Accepted Answer · 2014-04-18T01:00:24+0000

Working on an example shows where this happens:

 a <- data.frame(VAL=c(1,1,1,23,24)) a # VAL #1 1 #2 1 #3 1 #4 23 #5 24

These works:

 a$VAL %in% c(23,24) #[1] FALSE FALSE FALSE TRUE TRUE a$VAL==23 | a$VAL==24 #[1] FALSE FALSE FALSE TRUE TRUE

When comparing, it fails due to vector recirculation - pay attention to the warning message below:

 a$VAL ==c(23,24) #[1] FALSE FALSE FALSE FALSE FALSE #Warning message: #In a$VAL == c(23, 24) : # longer object length is not a multiple of shorter object length

This last bit of code processes what you are testing and basically compares:

 c( 1, 1, 1, 23, 24) #to c(23, 24, 23, 24, 23)

... so you will not get any rows. Reordering will give you

 c( 1, 1, 1, 23, 24) #to c(24, 23, 24, 23, 24)

... and you will get two rows returned (which gives the expected result by pure luck, but this is not practical to use).

The reason for the unexpected output in the data subset is R

More articles: