The most efficient way to create a symmetric matrix

I have the following / dataframe matrix:

> e V1 V2 V3 V4 V5 1 0 2 3 4 5 2 0 0 6 8 10 3 0 0 0 12 15 4 0 0 0 0 20 5 0 0 0 0 0 

In this case, N = 5 (number of rows = number of columns). I would like to fill in the missing values ​​in this symmetric matrix (e [1,2] = e [2,1], etc.). Is there a most efficient way to fill in the missing values ​​(N is the size of the matrix in my case is quite large)? Is there a better way than nested loops?

+8
loops r
source share
4 answers

Just to complete, I would also like to show this technique. Adding transposition will not work if the lower part of the matrix (below the diagonal) has filled values, since it will add them to the upper part of the matrix.

Using the Matrix package, we can create a sparse matrix, which in the case of creating a symmetric large matrix will require much less memory and even speed it up.

To create a symmetric sparse matrix from the matrix e , we would do:

 library(Matrix) rowscols <- which(upper.tri(e), arr.ind=TRUE) sparseMatrix(i=rowscols[,1], #rows to fill in j=rowscols[,2], #cols to fill in x=e[upper.tri(e)], #values to use (ie the upper values of e) symmetric=TRUE, #make it symmetric dims=c(nrow(e),nrow(e))) #dimensions 

Output:

 5 x 5 sparse Matrix of class "dsCMatrix" [1,] . 2 3 4 5 [2,] 2 . 6 8 10 [3,] 3 6 . 12 15 [4,] 4 8 12 . 20 [5,] 5 10 15 20 . 

Microbenchmark:

Let me make a function to make a symmetric matrix from a matrix (by default it copies the top of the matrix to the bottom):

 symmetrise <- function(mat){ rowscols <- which(upper.tri(mat), arr.ind=TRUE) sparseMatrix(i=rowscols[,1], j=rowscols[,2], x=mat[upper.tri(mat)], symmetric=TRUE, dims=c(nrow(mat),ncol(mat)) ) } 

And the test:

 > microbenchmark( e + t(e), symmetrise(e), e[lower.tri(e)] <- t(e)[lower.tri(e)], times=1000 ) Unit: microseconds expr min lq mean median uq max neval cld e + t(e) 75.946 99.038 117.1984 110.841 134.9590 246.825 1000 a symmetrise(e) 5530.212 6246.569 6950.7681 6921.873 7034.2525 48662.989 1000 c e[lower.tri(e)] <- t(e)[lower.tri(e)] 261.193 322.771 430.4479 349.968 395.3815 36873.894 1000 b 

As you can see, symmetrise is actually much slower than e + t(e) or df[lower.tri(df)] <- t(df)[lower.tri(df)] , but at least you there is a function that automatically balances the matrix (takes the top and creates a default by default), and if you have a large matrix where memory is a problem, this can come in handy.

PS Wherever you find . in the matrix, this means zero. When using another system, a sparse matrix is ​​a kind of β€œcompressed” object, which makes it more efficient from a memory point of view.

+6
source share

Also for speed:

 2*symmpart(as.matrix(e)) 

Here's the benchmark:

 Unit: microseconds expr min lq mean median uq max neval e + t(e) 572.505 597.194 655.132028 611.5420 628.4860 8424.902 1000 symmetrise(e) 1128.220 1154.562 1215.740071 1167.0020 1185.6585 10656.059 1000 e[lower.tri(e)] <- e[upper.tri(e, FALSE)] 285.013 311.191 350.846885 327.1335 339.5910 8106.006 1000 2 * symmpart(as.matrix(e)) 78.392 93.953 101.330522 102.1860 107.9215 153.628 1000 

He can get this speed because he creates a symmetric matrix directly.

+6
source share
 df[lower.tri(df)] <- t(df)[lower.tri(df)] 

Output:

  V1 V2 V3 V4 V5 1 0 2 3 4 5 2 2 0 6 8 10 3 3 6 0 12 15 4 4 8 12 0 20 5 5 10 15 20 0 

Data:

 df <- structure(list(V1 = c(0L, 0L, 0L, 0L, 0L), V2 = c(2L, 0L, 0L, 0L, 0L), V3 = c(3L, 6L, 0L, 0L, 0L), V4 = c(4L, 8L, 12L, 0L, 0L), V5 = c(5L, 10L, 15L, 20L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c("1", "2", "3", "4", "5")) 
+5
source share
 e + t(e) 

Adding a matrix and transposing this matrix, is that what you want?

+4
source share

All Articles