Group progressive column concatenation

Suppose I have this input:

ID date_1 date_2 str 1 1 2010-07-04 2008-01-20 A 2 2 2015-07-01 2011-08-31 C 3 3 2015-03-06 2013-01-18 D 4 4 2013-01-10 2011-08-30 D 5 5 2014-06-04 2011-09-18 B 6 5 2014-06-04 2011-09-18 B 7 6 2012-11-22 2011-09-28 C 8 7 2014-06-17 2013-08-04 A 10 7 2014-06-17 2013-08-04 B 11 7 2014-06-17 2013-08-04 B 

I would like to gradually concatenate the values โ€‹โ€‹of the str column using the group ID variable, as shown in the following output:

  ID date_1 date_2 str 1 1 2010-07-04 2008-01-20 A 2 2 2015-07-01 2011-08-31 C 3 3 2015-03-06 2013-01-18 D 4 4 2013-01-10 2011-08-30 D 5 5 2014-06-04 2011-09-18 B 6 5 2014-06-04 2011-09-18 B,B 7 6 2012-11-22 2011-09-28 C 8 7 2014-06-17 2013-08-04 A 10 7 2014-06-17 2013-08-04 A,B 11 7 2014-06-17 2013-08-04 A,B,B 

I tried using the ave() function with this code:

 within(table, { Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ",")) }) 

but it gives the following result, which is not quite what I want:

  ID date_1 date_2 str 1 1 2010-07-04 2008-01-20 A 2 2 2015-07-01 2011-08-31 C 3 3 2015-03-06 2013-01-18 D 4 4 2013-01-10 2011-08-30 D 5 5 2014-06-04 2011-09-18 B,B 6 5 2014-06-04 2011-09-18 B,B 7 6 2012-11-22 2011-09-28 C 8 7 2014-06-17 2013-08-04 A,B,B 10 7 2014-06-17 2013-08-04 A,B,B 11 7 2014-06-17 2013-08-04 A,B,B 

Of course, I would like to avoid loops as I am working on a large database.

+7
string r aggregation
source share
2 answers

How about ave() with Reduce() . The Reduce() function allows us to accumulate results as they are calculated. Therefore, if we run it with paste() , we can copy the inserted lines.

 f <- function(x) { Reduce(function(...) paste(..., sep = ", "), x, accumulate = TRUE) } df$str <- with(df, ave(as.character(str), ID, FUN = f) 

which gives an updated df data frame

  ID date_1 date_2 str 1 1 2010-07-04 2008-01-20 A 2 2 2015-07-01 2011-08-31 C 3 3 2015-03-06 2013-01-18 D 4 4 2013-01-10 2011-08-30 D 5 5 2014-06-04 2011-09-18 B 6 5 2014-06-04 2011-09-18 B, B 7 6 2012-11-22 2011-09-28 C 8 7 2014-06-17 2013-08-04 A 10 7 2014-06-17 2013-08-04 A, B 11 7 2014-06-17 2013-08-04 A, B, B 

Note. function(...) paste(..., sep = ", ") can also be function(x, y) paste(x, y, sep = ", ") . (Thanks to Pierre Lafortun)

+9
source share

Here's a possible solution combining data.table with an internal tapply that seems to deliver what you need (you can use paste instead of toString , if you want, it just looks cleaner for me).

 library(data.table) setDT(df)[, Str := tapply(str[sequence(1:.N)], rep(1:.N, 1:.N), toString), by = ID] df # ID date_1 date_2 str Str # 1: 1 2010-07-04 2008-01-20 AA # 2: 2 2015-07-01 2011-08-31 CC # 3: 3 2015-03-06 2013-01-18 DD # 4: 4 2013-01-10 2011-08-30 DD # 5: 5 2014-06-04 2011-09-18 BB # 6: 5 2014-06-04 2011-09-18 BB, B # 7: 6 2012-11-22 2011-09-28 CC # 8: 7 2014-06-17 2013-08-04 AA # 9: 7 2014-06-17 2013-08-04 BA, B # 10: 7 2014-06-17 2013-08-04 BA, B, B 

You may be able to improve it a bit by using

 setDT(df)[, Str := {Len <- 1:.N ; tapply(str[sequence(Len)], rep(Len, Len), toString)}, by = ID] 
+8
source share

All Articles