Delete rows of data that match the factor level (and then build the data excluding this factor level)

I have a data frame with 251 observations and 45 variables. There are 6 observations in the middle of the data frame that I would like to exclude from my analyzes. All 6 belong to the same factor level. It is easy to create a new data frame which, when printed, appears to exclude 6 observations. However, when I use the new data frame to plot the variables by the factor under consideration, the supposedly excluded level is still included in the graph (no observations). Using str () confirms that the level is still present in one form or another. In addition, the index for the new data frame skips 6 values ​​that previously contained observations.

How can I create a new data frame that excludes 6 cases and will not continue to recognize the excluded factor level when plotting? Is it possible to create a new data frame for “reindexing” so that the new index does not miss the values ​​previously assigned to the excluded factor level?

I gave an example with compiled data:

# --------------------------------------------- # data char <- c( rep("anc", 4), rep("nam", 3), rep("oom", 5), rep("apt", 3) ) a <- 1:15 / pi b <- seq(1, 8, .5) d <- rep(c(3, 8, 5), 5) dat <- data.frame(char, a, b, d) dat # two ways to remove rows that contain a string datNew1 <- dat[-which(dat$char == "nam"), ] datNew1 datNew2 <- dat[grep("nam", dat[ ,"char"], invert=TRUE), ] datNew2 # plots still contain the factor level that was excluded boxplot(datNew1$a ~ datNew1$char) boxplot(datNew2$a ~ datNew2$char) # str confirms that it still there str(datNew1) str(datNew2) # --------------------------------------------- 
+4
source share
3 answers

You can use the drop.levels() function from the gdata package to reduce the levels of factors to actually used ones - apply it in your column after creating a new data.frame .

Also try finding r and drop.levels here (but you need to make the search term [r] drop.levels , which I can't here, as it interferes with the formatting logic).

+8
source

Starting with version 2.12.0 R, there is a droplevels function that can be applied either to factor columns or to the entire data frame. When applied to a data frame, it removes zero count levels from all factor columns. Thus, your example will become simple:

 # two ways to remove rows that contain a string datNew1 <- droplevels( dat[-which(dat$char == "nam"), ] ) datNew2 <- droplevels( dat[grep("nam", dat[ ,"char"], invert=TRUE), ] ) 
+1
source

I put something from my code - I have an experiment with an application in the lake - there are measurements from the buildings and the lake, but basically I don’t want to deal with the lake: my variable is called "t.level", and the levels are control, low medium high and lake, this code allows you to use nolk $ or data = nolk to get data without a "lake".

 nolk<-subset(mylakedata,t.level == "control" | t.level == "low" | t.level == "medium" | t.level=="high") nolk[]<-lapply(nolk, function(t.level) if(is.factor(t.level)) t.level[drop=T] else t.level) 
0
source

All Articles