Group combinations in R

I have a question about group combinations.

My mini sample looks like this:

sample <- data.frame( group=c("a","a","a","a","b","b","b"), number=c(1,2,3,2,4,5,3) ) 

If I apply the combn function to a data frame, this gives me the following result, which is all combinations of values ​​under the number column, regardless of which group this value belongs to:

  [,1] [,2] [1,] 1 2 [2,] 1 3 [3,] 1 2 [4,] 1 4 [5,] 1 5 [6,] 1 3 [7,] 2 3 [8,] 2 2 [9,] 2 4 [10,] 2 5 [11,] 2 3 [12,] 3 2 [13,] 3 4 [14,] 3 5 [15,] 3 3 [16,] 2 4 [17,] 2 5 [18,] 2 3 [19,] 4 5 [20,] 4 3 [21,] 5 3 

The code I used for the above results is as follows:

 t(combn((sample$number), 2)) 

However, I would like to get the results of the combination within the group (ie "a", "b"). Therefore, the result I want to get should look like this:

  [,1] [,2] [,3] [1,] a 1 2 [2,] a 1 3 [3,] a 1 2 [4,] a 2 3 [5,] a 2 2 [6,] a 3 2 [7,] b 4 5 [8,] b 4 3 [9,] b 5 3 

In addition to the combinations, I would like to get a column indicating the group.

+5
source share
2 answers

We can use the function group with data.table

 library(data.table) setDT(sample)[, {i1 <- combn(number, 2) list(i1[1,], i1[2,]) }, by = group] # group V1 V2 #1: a 1 2 #2: a 1 3 #3: a 1 2 #4: a 2 3 #5: a 2 2 #6: a 3 2 #7: b 4 5 #8: b 4 3 #9: b 5 3 

Or a compact option would be

 setDT(sample)[, transpose(combn(number, 2, FUN = list)), by = group] 

Or using base R

  lst <- by(sample$number, sample$group, FUN = combn, m= 2) data.frame(group = rep(unique(as.character(sample$group)), sapply(lst, ncol)), t(do.call(cbind, lst))) 
+3
source

It uses the basic R option, using (1) split to create a list of data. Frames for each unique entry in the group, (2) lapply to cycle through each element of the list and calculate combinations using combn , (3) do.call(rbind, ...) to collect the elements of the list back into one data.frame .

 do.call(rbind, lapply(split(sample, sample$group), { function(x) data.frame(group = x$group[1], t(combn(x$number, 2))) })) # group X1 X2 #a.1 a 1 2 #a.2 a 1 3 #a.3 a 1 2 #a.4 a 2 3 #a.5 a 2 2 #a.6 a 3 2 #b.1 b 4 5 #b.2 b 4 3 #b.3 b 5 3 

And the dplyr option:

 library(dplyr) sample %>% group_by(group) %>% do(data.frame(t(combn(.$number, 2)))) #Source: local data frame [9 x 3] #Groups: group [2] # # group X1 X2 # (fctr) (dbl) (dbl) #1 a 1 2 #2 a 1 3 #3 a 1 2 #4 a 2 3 #5 a 2 2 #6 a 3 2 #7 b 4 5 #8 b 4 3 #9 b 5 3 
+3
source

All Articles