R Create non-duplicate pairs in a data frame while avoiding the same group members.

Thus, the goal is to compare each identifier with each other using distance. Also, some identifiers may be related to belonging to the same group, which means that there is no need to compare them if they are related.

Consider the following data block Df

ID AN     AW      Group
a  white  green   1
b  black  yellow  1
c  purple gray    2
d  white  gray    2

The following code helps to achieve this result (from the question: R Generate non-repeating pairs in the data frame ):

ids <- combn(unique(df$ID), 2)
data.frame(df[match(ids[1,], df$ID), ], df[match(ids[2,], df$ID), ])

#ID   AN     AW    ID2   AN2    AW2
a   white  green   b   black  yellow
a   white  green   c   purple gray
a   white  green   d   white  gray
b   black  yellow  c   purple gray 
b   black  yellow  d   white  gray
c   purple gray    d   white  gray

I want to know if certain calculations can be calculated to get this answer:

#ID   AN     AW    Group   ID2   AN2    AW2   Group2
a   white  green     1      c   purple gray    2
a   white  green     1      d   white  gray    2
b   black  yellow    1      c   purple gray    2
b   black  yellow    1      d   white  gray    2

Value I can avoid these calculations:

#ID   AN     AW    Group   ID2   AN2    AW2    Group2
a   white  green     1      b   black  yellow    1
c   purple gray      2      d   white  gray      2

I can subset if I compare groups, but that means more computational time, since the data frame is large and the combinations follow n*(n-1)/2

? , ?

+4
2

R, , .

# create test data.frame
df <- data.frame(ID=letters[1:4], AN=c("white", "black", "purple", "white"),
                 AW=c("green", "yellow", "gray", "gray"),
                 Group=rep(c(1,2),each=2), stringsAsFactors=FALSE)

# split data.frame by group, subset df to needed variables
dfList <- split(df[, c("ID", "Group")], df$Group)
# use combn to get all group-pair combinations
groupPairs <- combn(unique(df$Group), 2)

( sapply) . data.frame, expand.grid. ( [[]]) , dfList groupPairs[1,i] groupPairs[2,i].

# get a list of all ID combinations by group combination
myComparisonList <- sapply(1:ncol(groupPairs), function(i) {
                           expand.grid(dfList[[groupPairs[1,i]]]$ID,
                                       dfList[[groupPairs[2,i]]]$ID,
                                       stringsAsFactors=F)
                           })
# extract list of combinations to matrix
idsMat <- sapply(myComparisonList, rbind)

# bind comparison pairs together by column
dfDone <- cbind(df[match(idsMat[,1], df$ID), ], df[match(idsMat[,2], df$ID), ])
# differentiate names
names(dfDone) <- paste0(names(dfDone), rep(c(".1", ".2"),
                        each=length(names(df))))
+1

sql , g .

sqldf("select * from f t1 inner join f t2 on t1.g!=t2.g")
0

All Articles