Grouping low levels in a data frame in R

Question

Grouping low levels in a data frame in R

Suppose I have a data frame that has a column named C. C has many levels that occur only once. How would I rename all levels that happen only once with a new level (called z)?

ABC aaaabbaacabdaba

The above has turned into:

 ABC aaaabzaazabzaba

+5

r

kevin ko Jul 23 '15 at 18:36

source share

3 answers

I'm sure there is a more elegant way to do this, but here is one solution:

 df <- read.table(text = "ABC aaa abb aac abd aba", header = TRUE) # Get the number of times each factor occurs: counts <- table(df$C) # Replace each one that only occurs once with "z" df$C <- ifelse(df$C %in% names(counts[counts == 1]), "z", as.character(df$C)) # Since the levels changed, encode as a factor again: df$C <- factor(df$C)

This gives:

 R> df$C [1] azzza Levels: az

+2

christoph Jul 23 '15 at 18:48

source share

using dplyr:

 library(dplyr) df %>% group_by(C) %>% mutate(D = as.character(ifelse(n() == 1, "z", as.character(C))))

There are some ugly things to deal with ifelse there.

+1

jeremycg Jul 23 '15 at 19:14

source share

Datamine r · Accepted Answer · 2015-07-23T19:01:21+0000

How about this (if your data is df )?

 levels(df[,3])[table(df[,3])==1] <- "z" df ABC 1 aaa 2 abz 3 aaz 4 abz 5 aba

Grouping low levels in a data frame in R

More articles: