Grouping low levels in a data frame in R

Suppose I have a data frame that has a column named C. C has many levels that occur only once. How would I rename all levels that happen only once with a new level (called z)?

ABC aaaabbaacabdaba 

The above has turned into:

 ABC aaaabzaazabzaba 
+5
source share
3 answers

How about this (if your data is df )?

 levels(df[,3])[table(df[,3])==1] <- "z" df ABC 1 aaa 2 abz 3 aaz 4 abz 5 aba 
+5
source

I'm sure there is a more elegant way to do this, but here is one solution:

 df <- read.table(text = "ABC aaa abb aac abd aba", header = TRUE) # Get the number of times each factor occurs: counts <- table(df$C) # Replace each one that only occurs once with "z" df$C <- ifelse(df$C %in% names(counts[counts == 1]), "z", as.character(df$C)) # Since the levels changed, encode as a factor again: df$C <- factor(df$C) 

This gives:

 R> df$C [1] azzza Levels: az 
+2
source

using dplyr:

 library(dplyr) df %>% group_by(C) %>% mutate(D = as.character(ifelse(n() == 1, "z", as.character(C)))) 

There are some ugly things to deal with ifelse there.

+1
source

All Articles