How to count instances of repeated characters inside a string?

I have a dataframe:

levels counts 1, 2, 2 24 1, 2 20 1, 3, 3, 3 15 1, 3 10 1, 2, 3 25 

I want to consider, for example, "1, 2, 2" and "1, 2" as one and the same. Therefore, as long as there is "1" and "2" without any other symbol, it will be considered the level of "1, 2". Here is the data frame you need:

 levels counts 1, 2 44 1, 3 25 1, 2, 3 25 

Here is the code to reproduce the original data frame:

 df <- data.frame(levels = c("1, 2, 2", "1, 2", "1, 3, 3, 3", "1, 3", "1, 2, 3"), counts = c(24, 20, 15, 10, 25)) df$levels <- as.character(df$levels) 
+7
r duplicates character
source share
2 answers

Divide df$levels , get unique items, and then sort them. Then use this to get aggregate counts .

 df$levels2 = sapply(strsplit(df$levels, ", "), function(x) paste(sort(unique(x)), collapse = ", ")) #Or toString(sort(unique(x)))) aggregate(counts~levels2, df, sum) # levels2 counts #1 1, 2 44 #2 1, 2, 3 25 #3 1, 3 25 
+6
source share

The solution uses tidyverse . df2 is the final result.

 library(tidyverse) df2 <- df %>% mutate(ID = 1:n()) %>% mutate(levels = strsplit(levels, split = ", ")) %>% unnest() %>% distinct() %>% arrange(ID, levels) %>% group_by(ID, counts) %>% summarise(levels = paste(levels, collapse = ", ")) %>% ungroup() %>% group_by(levels) %>% summarise(counts = sum(counts)) 

Update

Based on the comments below, a solution using ideas similar to db

 df2 <- df %>% mutate(l2 = map_chr(strsplit(levels, ", "), .f = ~ .x %>% unique %>% sort %>% toString)) %>% group_by(l2) %>% summarise(counts = sum(counts)) 
0
source share

All Articles