Given the R framework with column A, how do I create two new columns containing all ordered combinations of A

Question

Given the R framework with column A, how do I create two new columns containing all ordered combinations of A

I have data.frame with one id column (x below) and a series of variables (y1, y2 below).

x y1 y2 1 1 43 55 2 2 51 53 [...]

What I would like to generate from this is a data frame in which the first two columns span each ordered combination of x (except where they are equal) along with the columns for each variable related to the order. The header of the data frame and the first two lines will look like this (this was done manually, apologized):

 xi xj y1i y1j y2i y2j 1 2 43 51 55 53 2 1 51 43 53 55 [...]

Thus, each row will contain the source and destination (i and j), and then the values for y1 in each source and destination.

I'm slowly learning R data manipulation, but it pushes me. Kudos for one line will still answer, as well as a more readable didactic answer.

+4

r dataframe data-manipulation

mindless.panda Jun 29 '11 at 20:11

source share

4 answers

Two lines is the best I can do, and still keep it reasonable: ( Edit: see the bottom of the answer for one layer).

Create some data:

 n <- 4 a <- cbind(x=LETTERS[1:n], y=letters[1:n]) a xy [1,] "A" "a" [2,] "B" "b" [3,] "C" "c" [4,] "D" "d"

Code:

 f <- function(x, i){cbind(i, x[i[,1],], x[i[,2],])} f(a, t(combn(seq_len(nrow(a)), 2)))

Results:

  xyxy [1,] "1" "2" "A" "a" "B" "b" [2,] "1" "3" "A" "a" "C" "c" [3,] "1" "4" "A" "a" "D" "d" [4,] "2" "3" "B" "b" "C" "c" [5,] "2" "4" "B" "b" "D" "d" [6,] "3" "4" "C" "c" "D" "d"

EDIT

This can be turned into single-line using anonymous functions:

 (function(x, i=t(combn(seq_len(nrow(a)), 2))){cbind(i, x[i[,1],], x[i[,2],])})(a) xyxy [1,] "1" "2" "A" "a" "B" "b" [2,] "1" "3" "A" "a" "C" "c" [3,] "1" "4" "A" "a" "D" "d" [4,] "2" "3" "B" "b" "C" "c" [5,] "2" "4" "B" "b" "D" "d" [6,] "3" "4" "C" "c" "D" "d"

+4

Andrie Jun 29 '11 at 23:35

source share

I'm not sure what you definitely want in general, but as far as I understand, this may be close to what you want:

 > library(combinat) # for permn > library(plyr) # for llply > > # sample data > d <- data.frame(x = 1:3, y1 = rnorm(3), y2 = rnorm(3)) > d x y1 y2 1 1 -0.17525893 -1.1660321 2 2 -0.05585689 -0.2059244 3 3 0.90500983 -1.3067601 > > # permutation of rows > idx <- permn(nrow(d)) > idx [[1]] [1] 1 2 3 ... snip ... [[6]] [1] 2 1 3 > > # a list of perm-ed data.frame > d2 <- llply(idx, function(i)data.frame(idx = 1:nrow(d), d[i,])) > d2 [[1]] idx x y1 y2 1 1 1 -0.17525893 -1.1660321 2 2 2 -0.05585689 -0.2059244 3 3 3 0.90500983 -1.3067601 ... snip ... [[6]] idx x y1 y2 2 1 2 -0.05585689 -0.2059244 1 2 1 -0.17525893 -1.1660321 3 3 3 0.90500983 -1.3067601 > > # merge htam > d3 <- subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), d2), select = -c(idx)) > d3 xx y1.x y2.x xy y1.y y2.y xx1 y1.x.1 y2.x.1 xy1 y1.y.1 y2.y.1 xx2 y1.x.2 y2.x.2 xy2 1 1 -0.17525893 -1.1660321 1 -0.17525893 -1.1660321 3 0.90500983 -1.3067601 3 0.90500983 -1.3067601 2 -0.05585689 -0.2059244 2 2 2 -0.05585689 -0.2059244 3 0.90500983 -1.3067601 1 -0.17525893 -1.1660321 2 -0.05585689 -0.2059244 3 0.90500983 -1.3067601 1 3 3 0.90500983 -1.3067601 2 -0.05585689 -0.2059244 2 -0.05585689 -0.2059244 1 -0.17525893 -1.1660321 1 -0.17525893 -1.1660321 3 y1.y.2 y2.y.2 1 -0.05585689 -0.2059244 2 -0.17525893 -1.1660321 3 0.90500983 -1.3067601 > > # and here is the one-liner version > subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), llply(permn(nrow(d)), function(i)data.frame(idx=1:nrow(d), d[i,]))), select=-c(idx)) xx y1.x y2.x xy y1.y y2.y xx1 y1.x.1 y2.x.1 xy1 y1.y.1 y2.y.1 xx2 y1.x.2 y2.x.2 xy2 1 1 -0.17525893 -1.1660321 1 -0.17525893 -1.1660321 3 0.90500983 -1.3067601 3 0.90500983 -1.3067601 2 -0.05585689 -0.2059244 2 2 2 -0.05585689 -0.2059244 3 0.90500983 -1.3067601 1 -0.17525893 -1.1660321 2 -0.05585689 -0.2059244 3 0.90500983 -1.3067601 1 3 3 0.90500983 -1.3067601 2 -0.05585689 -0.2059244 2 -0.05585689 -0.2059244 1 -0.17525893 -1.1660321 1 -0.17525893 -1.1660321 3 y1.y.2 y2.y.2 1 -0.05585689 -0.2059244 2 -0.17525893 -1.1660321 3 0.90500983 -1.3067601

If you provide more information, you may be able to get better answers.

+2

kohske Jun 29 '11 at 10:38

source share

Well, this is nowhere close to single-line (which I kind of doubt), but here's a “naive” approach:

 dat <- data.frame(x=1:5,y1=6:10,y2=11:15) #Collect all ordered pairs of elements of x tmp <- expand.grid(dat$x,dat$x) tmp <- tmp[tmp[,1] != tmp[,2],] #Init a matrix to hold the results rs <- as.matrix(cbind(tmp,matrix(NA,nrow(tmp),4))) #Loop through each ordered pair for (i in 1:nrow(rs)){ rs[i,3:6] <- c(dat$y1[rs[i,1:2]],dat$y2[rs[i,1:2]]) }

I did not name the columns, but this is easy to do after the fact.

Not very elegant, but maybe something to get you started ...

+1

joran Jun 29 '11 at 10:32

source share

Henry · Accepted Answer · 2011-06-29T23:09:04+0000

It works (maybe in order)

 firstdf <- data.frame(x = c( 1, 2, 4, 5), y1 = c(43,51,57,49), y2 = c(55,53,47,44)) co <- combn(firstdf$x,2) seconddf <- data.frame(xi = c(co[1,], co[2,]), xj = c(co[2,], co[1,])) thirddf <- merge(merge(seconddf, firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

for creating

 > thirddf xi xj y1j y2j y1i y2i 1 1 2 51 53 43 55 2 1 5 49 44 43 55 3 1 4 57 47 43 55 4 2 4 57 47 51 53 5 2 1 43 55 51 53 6 2 5 49 44 51 53 7 4 5 49 44 57 47 8 4 1 43 55 57 47 9 4 2 51 53 57 47 10 5 1 43 55 49 44 11 5 2 51 53 49 44 12 5 4 57 47 49 44

where the first and fifth lines correspond to your example.

If you take firstdf as firstdf and insist on one line, you can include it in

 merge(merge(data.frame(xi = c(combn(firstdf$x,2)[1,], combn(firstdf$x,2)[2,]), xj = c(combn(firstdf$x,2)[2,], combn(firstdf$x,2)[1,])), firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

but i really don't see the point

Given the R framework with column A, how do I create two new columns containing all ordered combinations of A

More articles: