Suppose I have a data frame that has a column named C. C has many levels that occur only once. How would I rename all levels that happen only once with a new level (called z)?
ABC aaaabbaacabdaba
The above has turned into:
ABC aaaabzaazabzaba
How about this (if your data is df )?
df
levels(df[,3])[table(df[,3])==1] <- "z" df ABC 1 aaa 2 abz 3 aaz 4 abz 5 aba
I'm sure there is a more elegant way to do this, but here is one solution:
df <- read.table(text = "ABC aaa abb aac abd aba", header = TRUE) # Get the number of times each factor occurs: counts <- table(df$C) # Replace each one that only occurs once with "z" df$C <- ifelse(df$C %in% names(counts[counts == 1]), "z", as.character(df$C)) # Since the levels changed, encode as a factor again: df$C <- factor(df$C)
This gives:
R> df$C [1] azzza Levels: az
using dplyr:
library(dplyr) df %>% group_by(C) %>% mutate(D = as.character(ifelse(n() == 1, "z", as.character(C))))
There are some ugly things to deal with ifelse there.