I have a data frame with 251 observations and 45 variables. There are 6 observations in the middle of the data frame that I would like to exclude from my analyzes. All 6 belong to the same factor level. It is easy to create a new data frame which, when printed, appears to exclude 6 observations. However, when I use the new data frame to plot the variables by the factor under consideration, the supposedly excluded level is still included in the graph (no observations). Using str () confirms that the level is still present in one form or another. In addition, the index for the new data frame skips 6 values that previously contained observations.
How can I create a new data frame that excludes 6 cases and will not continue to recognize the excluded factor level when plotting? Is it possible to create a new data frame for “reindexing” so that the new index does not miss the values previously assigned to the excluded factor level?
I gave an example with compiled data:
# --------------------------------------------- # data char <- c( rep("anc", 4), rep("nam", 3), rep("oom", 5), rep("apt", 3) ) a <- 1:15 / pi b <- seq(1, 8, .5) d <- rep(c(3, 8, 5), 5) dat <- data.frame(char, a, b, d) dat # two ways to remove rows that contain a string datNew1 <- dat[-which(dat$char == "nam"), ] datNew1 datNew2 <- dat[grep("nam", dat[ ,"char"], invert=TRUE), ] datNew2 # plots still contain the factor level that was excluded boxplot(datNew1$a ~ datNew1$char) boxplot(datNew2$a ~ datNew2$char) # str confirms that it still there str(datNew1) str(datNew2) # ---------------------------------------------
source share