which will provide a numerical index and skip all these NA lines. To avoid this, use a logical index without packaging with which . The index will be NA in this way, and this row will remain NA, even if there are other values ββthat are not NA.
res1 <- train[train$YOB >= 1900 & train$YOB <= 2003,] res1[is.na(res1$YOB),]
The correct way would be to have another condition with is.na
res2 <- train[is.na(train$YOB)| (train$YOB >= 1900 & train$YOB <= 2003),] res2[is.na(res2$YOB),]
Using a simple example
set.seed(25) d1 <- data.frame(v1 = c(NA, 1, 5), v2 = rnorm(3)) d1$v1 >1 #[1] NA FALSE TRUE
Here the value of NA remains so. If we use which
which(d1$v1 >1)
we get only the index of TRUE values. According to the OP, both NA and rows that satisfy the logical condition must be returned. In this case
d1[is.na(d1$v1)|d1$v1 > 1,] # v1 v2 #1 NA -0.2118336 #3 5 -1.1533076
data
set.seed(29) train <- data.frame(YOB = sample(c(NA, 1850:2015), 100, replace=TRUE), col2 = rnorm(100))
akrun source share