Delete rows of data that match the factor level (and then build the data excluding this factor level)

Question

Delete rows of data that match the factor level (and then build the data excluding this factor level)

I have a data frame with 251 observations and 45 variables. There are 6 observations in the middle of the data frame that I would like to exclude from my analyzes. All 6 belong to the same factor level. It is easy to create a new data frame which, when printed, appears to exclude 6 observations. However, when I use the new data frame to plot the variables by the factor under consideration, the supposedly excluded level is still included in the graph (no observations). Using str () confirms that the level is still present in one form or another. In addition, the index for the new data frame skips 6 values that previously contained observations.

How can I create a new data frame that excludes 6 cases and will not continue to recognize the excluded factor level when plotting? Is it possible to create a new data frame for “reindexing” so that the new index does not miss the values previously assigned to the excluded factor level?

I gave an example with compiled data:

# --------------------------------------------- # data char <- c( rep("anc", 4), rep("nam", 3), rep("oom", 5), rep("apt", 3) ) a <- 1:15 / pi b <- seq(1, 8, .5) d <- rep(c(3, 8, 5), 5) dat <- data.frame(char, a, b, d) dat # two ways to remove rows that contain a string datNew1 <- dat[-which(dat$char == "nam"), ] datNew1 datNew2 <- dat[grep("nam", dat[ ,"char"], invert=TRUE), ] datNew2 # plots still contain the factor level that was excluded boxplot(datNew1$a ~ datNew1$char) boxplot(datNew2$a ~ datNew2$char) # str confirms that it still there str(datNew1) str(datNew2) # ---------------------------------------------

+4

r

Steve Aug 18 '10 at 1:29

source share

3 answers

Starting with version 2.12.0 R, there is a droplevels function that can be applied either to factor columns or to the entire data frame. When applied to a data frame, it removes zero count levels from all factor columns. Thus, your example will become simple:

 # two ways to remove rows that contain a string datNew1 <- droplevels( dat[-which(dat$char == "nam"), ] ) datNew2 <- droplevels( dat[grep("nam", dat[ ,"char"], invert=TRUE), ] )

+1

chronos Feb 12 '14 at 15:18

source share

I put something from my code - I have an experiment with an application in the lake - there are measurements from the buildings and the lake, but basically I don’t want to deal with the lake: my variable is called "t.level", and the levels are control, low medium high and lake, this code allows you to use nolk $ or data = nolk to get data without a "lake".

 nolk<-subset(mylakedata,t.level == "control" | t.level == "low" | t.level == "medium" | t.level=="high") nolk[]<-lapply(nolk, function(t.level) if(is.factor(t.level)) t.level[drop=T] else t.level)

0

paul Jun 15 '11 at 10:57

source share

Dirk eddelbuettel · Accepted Answer · 2010-08-18T01:46:19+0000

You can use the drop.levels() function from the gdata package to reduce the levels of factors to actually used ones - apply it in your column after creating a new data.frame .

Also try finding r and drop.levels here (but you need to make the search term [r] drop.levels , which I can't here, as it interferes with the formatting logic).

Delete rows of data that match the factor level (and then build the data excluding this factor level)

More articles: