Randomized Singular SVD Values

randomized SVD decomposes the matrix by extracting the first k singular values ​​/ vectors using k + p random projections. this works surprisingly well for large matrices.

my question is about singular values ​​that are inferred from the algorithm. why not values ​​equal to the first k-singular values ​​if you are doing a full SVD?

Below I have a simple implementation in R. Any suggestions for improving performance will be appreciated.

rsvd = function(A, k=10, p=5) { n = nrow(A) y = A %*% matrix(rnorm(n * (k+p)), nrow=n) q = qr.Q(qr(y)) b = t(q) %*% A svd = svd(b) list(u=q %*% svd$u, d=svd$d, v=svd$v) } > set.seed(10) > A <- matrix(rnorm(500*500),500,500) > svd(A)$d[1:15] [1] 44.94307 44.48235 43.78984 43.44626 43.27146 43.15066 42.79720 42.54440 42.27439 42.21873 41.79763 41.51349 41.48338 41.35024 41.18068 > rsvd.o(A,10,5)$d [1] 34.83741 33.83411 33.09522 32.65761 32.34326 31.80868 31.38253 30.96395 30.79063 30.34387 30.04538 29.56061 29.24128 29.12612 27.61804 
+7
performance math matrix r
source share
2 answers

Payment

I believe that your algorithm is a modification of the algorithm of Martinsson et al. . If I understood this correctly, this is especially important for approximations for low-rank matrices. Maybe I'm wrong.

The difference is easily explained by the huge difference between the actual rank of A (500) and the values ​​of k (10) and p (5). In addition, Martinsson and others note that the value for p must be actually greater than the selected value for k.

So, if we apply your solution based on their considerations, using:

 set.seed(10) A <- matrix(rnorm(500*500),500,500) # rank 500 B <- matrix(rnorm(500*50),500,500) # rank 50 

We find for timings that using a larger p value still leads to tremendous acceleration compared to the original svd algorithm.

 > system.time(t1 <- svd(A)$d[1:5]) user system elapsed 0.8 0.0 0.8 > system.time(t2 <- rsvd(A,10,5)$d[1:5]) user system elapsed 0.01 0.00 0.02 > system.time(t3 <- rsvd(A,10,30)$d[1:5]) user system elapsed 0.04 0.00 0.03 > system.time(t4 <- svd(B)$d[1:5] ) user system elapsed 0.55 0.00 0.55 > system.time(t5 <-rsvd(B,10,5)$d[1:5] ) user system elapsed 0.02 0.00 0.02 > system.time(t6 <-rsvd(B,10,30)$d[1:5] ) user system elapsed 0.05 0.00 0.05 > system.time(t7 <-rsvd(B,25,30)$d[1:5] ) user system elapsed 0.06 0.00 0.06 

But we see that using a higher p for a matrix of lower rank does give a better approximation. If we assume that k also approaches the rank a little closer, the difference between the real solution and the approximation will become appx. 0, and the gain is still substantial.

 > round(mean(t2/t1),2) [1] 0.77 > round(mean(t3/t1),2) [1] 0.82 > round(mean(t5/t4),2) [1] 0.92 > round(mean(t6/t4),2) [1] 0.97 > round(mean(t7/t4),2) [1] 1 

Therefore, I believe that it can be concluded that:

  • p should be chosen so that p> k (Martinsson calls it l , if I'm right)
  • k should not differ much from rank (A)
  • For low rank matrices, the result is usually better.

optimization:

As far as I understand, this is a neat way to do this. In fact, I could not find a more optimal way. The only thing I could say is that the construction t(q) %*% A recommended. To do this, use crossprod(q,A) , which should be slightly faster. But in your example, the difference was not significant.

+3
source share

The Halko, Martinsson, and Tropp document also recommends doing a couple of force iterations before calculating the QR. We do three default iterations by default in the implementation in scikit-learn , and we find that it works very well in practice .

+2
source share

All Articles