Grouping strings with dplyr

I have a dataframe that looks like this:

> data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd')) > data foo bar 1 1 a 2 1 b 3 2 a 4 3 b 5 3 c 6 3 d 

I would like to create a new column bars_by_foo, which is a concatenation of bar values ​​by foo. Therefore, the new data should look like this:

  foo bar bars_by_foo 1 1 a ab 2 1 b ab 3 2 aa 4 3 b bcd 5 3 c bcd 6 3 d bcd 

I was hoping the following would work:

 p <- function(v) { Reduce(f=paste, x = v) } data %>% group_by(foo) %>% mutate(bars_by_foo=p(bar)) 

But this code gives me an error

Error: incompatible types, expecting a character vector .

What am I doing wrong?

+7
r dplyr
source share
4 answers

You can just do

 data %>% group_by(foo) %>% mutate(bars_by_foo = paste0(bar, collapse = "")) 

Without any helper functions

+21
source share

It seems that there is some problem with the mutate function - I found that this is the best approach to working with summarise when you group data in dplyr (this is not so difficult and fast rule).

Function

paste also introduces spaces in the result, so either set sep = 0 , or just use paste0 .

Here is my code:

 p <- function(v) { Reduce(f=paste0, x = v) } data %>% group_by(foo) %>% summarise(bars_by_foo = p(as.character(bar))) %>% merge(., data, by = 'foo') %>% select(foo, bar, bars_by_foo) 

Result..

  foo bar bars_by_foo 1 1 a ab 2 1 b ab 3 2 aa 4 3 b bcd 5 3 c bcd 6 3 d bcd 
+2
source share

You can try the following:

 agg <- aggregate(bar~foo, data = data, paste0, collapse="") df <- merge(data, agg, by = "foo", all = T) colnames(df) <- c(colnames(data), "bars_by_foo") # optional # foo bar bars_by_foo # 1 1 a ab # 2 1 b ab # 3 2 aa # 4 3 b bcd # 5 3 c bcd # 6 3 d bcd 
+1
source share

Your function works if you make sure that all symbols are not factor levels.

 data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'), stringsAsFactors = FALSE) library("dplyr") p <- function(v) { Reduce(f=paste, x = v) } data %>% group_by(foo) %>% mutate(bars_by_foo=p(bar)) Source: local data frame [6 x 3] Groups: foo [3] foo bar bars_by_foo <dbl> <chr> <chr> 1 1 aab 2 1 bab 3 2 aa 4 3 bbcd 5 3 cbcd 6 3 dbcd 
0
source share

All Articles