Create group_indices based on multiple columns

Question

Create group_indices based on multiple columns

I would like to generate indexes for grouping observations based on two columns. But I want the groups to be made observations that share at least one observation in the community. I see how to create observation-based groups that share a common observation, but not just one of them.

For example, with a data frame:

dt <- data.frame(id=1:10, G1 = c("A","A","B","B","C","C","C","D","E","F"), G2 = c("Z","X","X","Y","W","V","U","s","T","T"))

I would like to get a column

 1,1,1,1,2,2,2,3,4,4

I tried using group_indices from dplyr but could not execute it.

+11

r dplyr

Malta Jul 13 '17 at 11:40

source share

2 answers

Thanks for the codes for the 2 column indices. Could you share the code for multiple columns.

0

Anandan Jul 10 '19 at 9:04

source share

zx8754 · Accepted Answer · 2017-07-13T11:50:00+0000

Using igraph, get a membership, then draw the names:

 library(igraph) # convert to graph, and get clusters membership ids g <- graph_from_data_frame(df1[, c(2, 3, 1)]) myGroups <- components(g)$membership myGroups # ABCDEFZXYWVU s T # 1 1 2 3 4 4 1 1 1 2 2 2 3 4 # then map on names df1$group <- myGroups[df1$G1] df1 # id G1 G2 group # 1 1 AZ 1 # 2 2 AX 1 # 3 3 BX 1 # 4 4 BY 1 # 5 5 CW 2 # 6 6 CV 2 # 7 7 CU 2 # 8 8 D s 3 # 9 9 ET 4 # 10 10 FT 4

Create group_indices based on multiple columns

More articles: