How to extract values ​​between adjacent variables in a correlation matrix in R?

I have a huge correlation matrix, but here is just an example:

set.seed(1234) corrmat <- matrix(round (rnorm (36, 0, 0.3),2), ncol=6) rownames (corrmat) <- colnames (corrmat) <- c("A", "b1", "b2", "C", "L", "ctt") diag(corrmat) <- NA corrmat[upper.tri (corrmat)] <- NA A b1 b2 CL ctt A NA NA NA NA NA NA b1 0.08 NA NA NA NA NA b2 0.33 -0.17 NA NA NA NA C -0.70 -0.27 -0.03 NA NA NA L 0.13 -0.14 -0.15 -0.13 NA NA ctt 0.15 -0.30 -0.27 0.14 -0.28 NA > melt(corrmat) X1 X2 value 1 AA NA 2 b1 A 0.08 3 b2 A 0.33 4 CA -0.70 5 LA 0.13 6 ctt A 0.15 7 A b1 NA 8 b1 b1 NA 9 b2 b1 -0.17 10 C b1 -0.27 11 L b1 -0.14 12 ctt b1 -0.30 13 A b2 NA 14 b1 b2 NA 15 b2 b2 NA 16 C b2 -0.03 17 L b2 -0.15 18 ctt b2 -0.27 19 AC NA 20 b1 C NA 21 b2 C NA 22 CC NA 23 LC -0.13 24 ctt C 0.14 25 AL NA 26 b1 L NA 27 b2 L NA 28 CL NA 29 LL NA 30 ctt L -0.28 31 A ctt NA 32 b1 ctt NA 33 b2 ctt NA 34 C ctt NA 35 L ctt NA 36 ctt ctt NA 

What I'm looking for - these are correlation values ​​between neighboring ones - means that between A-b1, b1-b2, b2-C, CL, L-ctt (in the order in the column). I need to delete other values ​​and NA. This is expected to be:

  X1 X2 value 2 b1 A 0.08 9 b2 b1 -0.17 16 C b2 -0.03 23 LC -0.13 30 ctt L -0.28 

So they are in order A-b1-b2-CL-ctt .

Is there an easy way to filter it?

+4
source share
4 answers

Here is one way to use the often overlooked row() and col() functions

 > corrmat ## my version as there was no set.seed A b1 b2 CL ctt A NA NA NA NA NA NA b1 0.03 NA NA NA NA NA b2 -0.41 -0.02 NA NA NA NA C 0.11 0.61 -0.18 NA NA NA L -0.28 -0.28 0.39 0.01 NA NA ctt -0.21 -0.41 -0.55 0.34 -0.13 NA > corrmat[row(corrmat) == col(corrmat) + 1] [1] 0.03 -0.02 -0.18 0.01 -0.13 

Note that we are indexing the corrmat matrix as a vector here, and a bit in brackets refers to the elements to be returned, where the row index of each element corresponds to the column index of each element plus 1. Using -1 will give you super-diagonal (i.e. above the diagonal).

Combine all of this:

 out <- data.frame(X1 = rownames(corrmat)[-1], X2 = head(colnames(corrmat), -1), Value = corrmat[row(corrmat) == col(corrmat) + 1]) > out X1 X2 Value 1 b1 A 0.03 2 b2 b1 -0.02 3 C b2 -0.18 4 LC 0.01 5 ctt L -0.13 
+7
source

Here is one way:

 n = rownames(corrmat) pair.table = data.frame(X1=head(n, -1), X2=tail(n, -1), value=diag(tail(corrmat, -1))) 

Result:

 > pair.table X1 X2 value 1 A b1 0.08 2 b1 b2 -0.17 3 b2 C -0.03 4 CL -0.13 5 L ctt -0.28 
+4
source

This is just 1 from the diagonal of the correlation matrix. So, all you have to do is just move the diagonal so that it is, and you are set up. Remove the first row and last column, and then just diag .

 corrmat <- corrmat[-1,-ncol(corrmat)] data.frame(X1 = rownames(corrmat), X2 = colnames(corrmat), r = diag(corrmat)) 
+2
source

My solution is based on the generation of combinations (comb function) using rows / columns and “searching” records in a square distance matrix. SIF stands for simple interaction file.

 makeSIF <- function(x) { # args - # x - m*m distance or correlation matrix # @returns data frame in SIF format # sif <- as.data.frame(t(combn(as.character(rownames(x)), 2))) #print(sif) weight <- apply(sif, 1, indexDMatFromLookup, x) sif2 <- data.frame(sif, weight) return(sif2) } indexDMatFromLookup <- function(lookup, x) { return(indexDMat(x, lookup[1], lookup[2])) } indexDMat <- function(x, i1,i2) { return(x[i1,i2]) } 

Seeing other answers is probably much slower.

edit: this is really not so bad.

system.time (replicate (1000, makeSIF (corrmat)))

user system has expired

0.976 0.900 0.975

system.time (replicate (1000, data.frame (X1 = head (n, -1), X2 = tail (n, -1), value = diag (tail (corrmat, -1)))))

user system has expired

0.656 0.000 0.658

only a split second slower than the john method.

+1
source

All Articles