How to count instances of repeated characters inside a string?

Question

How to count instances of repeated characters inside a string?

I have a dataframe:

levels counts 1, 2, 2 24 1, 2 20 1, 3, 3, 3 15 1, 3 10 1, 2, 3 25

I want to consider, for example, "1, 2, 2" and "1, 2" as one and the same. Therefore, as long as there is "1" and "2" without any other symbol, it will be considered the level of "1, 2". Here is the data frame you need:

 levels counts 1, 2 44 1, 3 25 1, 2, 3 25

Here is the code to reproduce the original data frame:

 df <- data.frame(levels = c("1, 2, 2", "1, 2", "1, 3, 3, 3", "1, 3", "1, 2, 3"), counts = c(24, 20, 15, 10, 25)) df$levels <- as.character(df$levels)

+7

r duplicates character

Jrp Jul 31 '17 at 15:54

source share

2 answers

db · Answer 1 · 2017-07-31T16:01:33+0000

Divide df$levels , get unique items, and then sort them. Then use this to get aggregate counts .

 df$levels2 = sapply(strsplit(df$levels, ", "), function(x) paste(sort(unique(x)), collapse = ", ")) #Or toString(sort(unique(x)))) aggregate(counts~levels2, df, sum) # levels2 counts #1 1, 2 44 #2 1, 2, 3 25 #3 1, 3 25

www · Answer 2 · 2017-07-31T16:14:32+0000

The solution uses tidyverse . df2 is the final result.

 library(tidyverse) df2 <- df %>% mutate(ID = 1:n()) %>% mutate(levels = strsplit(levels, split = ", ")) %>% unnest() %>% distinct() %>% arrange(ID, levels) %>% group_by(ID, counts) %>% summarise(levels = paste(levels, collapse = ", ")) %>% ungroup() %>% group_by(levels) %>% summarise(counts = sum(counts))

Update

Based on the comments below, a solution using ideas similar to db

 df2 <- df %>% mutate(l2 = map_chr(strsplit(levels, ", "), .f = ~ .x %>% unique %>% sort %>% toString)) %>% group_by(l2) %>% summarise(counts = sum(counts))

How to count instances of repeated characters inside a string?

Update

More articles: