R dichotomize a sparse matrix

I have a large allowed matrix of size 500x53380 and an attempt to dichotomize it. I tried using "event2dichot" in the sna package, but did not succeed because it requires an adjacency matrix or a network object.

I also tried to write a simple algorithm like

for ( i in 1:500) for (j in 1:53380) if (matrix[i,j]>0) matrix[i,j]=1 

it seems to work, but since the matrix is ​​very large, it takes several hours, at least several hours, and it still calculates, as I ask this question for reference!

Do you know a better way or hack this task?

thank you all.

+4
source share
4 answers

Although your question is about sparse matrices, it seems to me that your code actually describes a standard matrix.

If so, you can process the 500x53380 matrix in seconds. The following code uses the fact that the matrix is ​​internally stored in R as a vector. This means that you can apply one vector function throughout the matrix. The caveat is that after that you must restore the dimensions of the matrix.

Here is an illustration with a much smaller matrix:

 mr <- 5 mc <- 8 mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr) mat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] -1.477 1.773 1.630 -0.152 1.054 0.057 -1.260 0.999 [2,] -1.863 -0.312 -0.221 -0.102 0.892 -1.255 0.996 -0.193 [3,] -0.364 -0.059 2.317 1.156 0.893 0.225 0.392 -1.986 [4,] -1.123 -0.661 0.070 0.032 0.019 -1.763 -0.205 0.951 [5,] -0.111 -3.112 -0.970 -0.794 -1.372 -0.119 1.291 -0.680 mydim <- dim(mat) mat[mat>0] <- 1 mat[mat<0] <- 0 dim(mat) <- mydim mat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0 1 1 0 1 1 0 1 [2,] 0 0 0 0 1 0 1 0 [3,] 0 0 1 1 1 1 1 0 [4,] 0 0 1 1 1 0 0 1 [5,] 0 0 0 0 0 0 1 0 

Repeating this whole process for the 500x53380 matrix takes about 12 seconds on my machine:

 mr <- 500 mc <- 53380 system.time({ mat <- matrix(round(rnorm(mr*mc), 3), nrow=mr) mydim <- dim(mat) mat[mat>0] <- 1 mat[mat<0] <- 0 dim(mat) <- mydim }) user system elapsed 12.25 0.42 12.88 
+3
source

Think, vectorize and use only indexes. For instance:.

 mat <- matrix(0, nrow = 500, ncol = 53380) set.seed(7) fill <- sample(500*53380, 10000) mat[fill] <- sample(fill, 1:10, replace = TRUE) 

can be sampled using:

 mat[mat > 0] <- 1 

This is pretty fast on my workstation:

 > system.time(mat[mat > 0] <- 1) user system elapsed 1.680 0.166 1.875 
+2
source

If you use the Matrix package, and the matrix is, say, Mat, then you can work with Mat@x as a vector. For instance. ix_low <- ( Mat@x < threshold), then Mat@x [ix_low] = 0, Mat@x [!ix_low] = 1 .

The key is that you think differently when you look at sparse matrices. A typical representation is (i, j, value).

You just look at the touch of the vector of values ​​- do not iterate over anything else.

+2
source

An easy way to do this with a formally defined sparse matrix (i.e., a matrix generated in the Matrix 'base with capital M instead of the old Matrix ' base) is to force the matrix to boolean using the as command, then return to the numeric or integer matrix.

0
source

All Articles