The fastest way to create a sparse matrix of the form AT * diag (b) * A + C?

I am trying to optimize a piece of code that solves a large sparse nonlinear system using the internal point method. During the update phase, this involves calculating the Hessian matrix H , the gradient g , then solving for d in H * d = -g to obtain a new search direction.

The Hessian matrix has a symmetric tridiagonal structure of the form:

AT * diag (b) * A + C

I ran line_profiler for a specific function:

 Line # Hits Time Per Hit % Time Line Contents ================================================== 386 def _direction(n, res, M, Hsig, scale_var, grad_lnprior, z, fac): 387 388 # gradient 389 44 1241715 28220.8 3.7 g = 2 * scale_var * res - grad_lnprior + z * np.dot(MT, 1. / n) 390 391 # hessian 392 44 3103117 70525.4 9.3 N = sparse.diags(1. / n ** 2, 0, format=FMT, dtype=DTYPE) 393 44 18814307 427597.9 56.2 H = - Hsig - z * np.dot(MT, np.dot(N, M)) # slow! 394 395 # update direction 396 44 10329556 234762.6 30.8 d, fac = my_solver(H, -g, fac) 397 398 44 111 2.5 0.0 return d, fac 

When looking at the result, it is clear that building H by far the most expensive step - it takes significantly longer than actually solving for a new direction.

Hsig and M are both CSC sparse matrices, n is a dense vector, and z is a scalar. The solver that I use requires H be either a sparse CSC matrix or CSR.

Here is a function that produces some toy data with the same formats, sizes and sparseness as my real matrices:

 import numpy as np from scipy import sparse def make_toy_data(nt=200000, nc=10): d0 = np.random.randn(nc * (nt - 1)) d1 = np.random.randn(nc * (nt - 1)) M = sparse.diags((d0, d1), (0, nc), shape=(nc * (nt - 1), nc * nt), format='csc', dtype=np.float64) d0 = np.random.randn(nc * nt) Hsig = sparse.diags(d0, 0, shape=(nc * nt, nc * nt), format='csc', dtype=np.float64) n = np.random.randn(nc * (nt - 1)) z = np.random.randn() return Hsig, M, n, z 

And here is my original approach to building H :

 def original(Hsig, M, n, z): N = sparse.diags(1. / n ** 2, 0, format='csc') H = - Hsig - z * np.dot(MT, np.dot(N, M)) # slow! return H 

Timing:

 %timeit original(Hsig, M, n, z) # 1 loops, best of 3: 483 ms per loop 

Is there a faster way to build this matrix?

+2
source share
2 answers

I am approaching 4x acceleration when calculating the MT * D * M product from three diagonal arrays. If d0 and d1 are the main and upper diagonals of M , and d is the main diagonal of d , then the following code creates directly MT * D * M :

 def make_tridi_bis(d0, d1, d, nc=10): d00 = d0*d0*d d11 = d1*d1*d d01 = d0*d1*d len_ = d0.size data = np.empty((3*len_ + nc,)) indices = np.empty((3*len_ + nc,), dtype=np.int) # Fill main diagonal data[:2*nc:2] = d00[:nc] indices[:2*nc:2] = np.arange(nc) data[2*nc+1:-2*nc:3] = d00[nc:] + d11[:-nc] indices[2*nc+1:-2*nc:3] = np.arange(nc, len_) data[-2*nc+1::2] = d11[-nc:] indices[-2*nc+1::2] = np.arange(len_, len_ + nc) # Fill top diagonal data[1:2*nc:2] = d01[:nc] indices[1:2*nc:2] = np.arange(nc, 2*nc) data[2*nc+2:-2*nc:3] = d01[nc:] indices[2*nc+2:-2*nc:3] = np.arange(2*nc, len_+nc) # Fill bottom diagonal data[2*nc:-2*nc:3] = d01[:-nc] indices[2*nc:-2*nc:3] = np.arange(len_ - nc) data[-2*nc::2] = d01[-nc:] indices[-2*nc::2] = np.arange(len_ - nc ,len_) indptr = np.empty((len_ + nc + 1,), dtype=np.int) indptr[0] = 0 indptr[1:nc+1] = 2 indptr[nc+1:len_+1] = 3 indptr[-nc:] = 2 np.cumsum(indptr, out=indptr) return sparse.csr_matrix((data, indices, indptr), shape=(len_+nc, len_+nc)) 

If your matrix M was in CSR format, you can extract d0 and d1 as d0 = M.data[::2] and d1 = M.data[1::2] , I modified your gaming data processing routine to return these arrays, and here is what I get:

 In [90]: np.allclose((MT * sparse.diags(d, 0) * M).A, make_tridi_bis(d0, d1, d).A) Out[90]: True In [92]: %timeit make_tridi_bis(d0, d1, d) 10 loops, best of 3: 124 ms per loop In [93]: %timeit MT * sparse.diags(d, 0) * M 1 loops, best of 3: 501 ms per loop 

The whole purpose of the above code is to take advantage of the structure of non-zero entries. If you draw a diagram of the matrices that you multiply together, it is relatively easy to convince yourself that the main ( d_0 ) and upper and lower ( d_1 ) diagonals of the resulting tridiagonal matrix are simple:

 d_0 = np.zeros((len_ + nc,)) d_0[:len_] = d00 d_0[-len_:] += d11 d_1 = d01 

The rest of the code in this function simply builds the tridiagonal matrix directly, since calling sparse.diags with the above data is several times slower.

+3
source

I tried running a test file and had problems with np.dot(N, M) . I did not delve into it, but I think that my numpy / sparse combo (both fairly new) had problems using np.dot on sparse arrays.

But H = -Hsig - z*MTdot(N.dot(M)) works just fine. This uses sparse dot .

I do not run the profile, but here are the Ipython timings for several parts. Data generation takes more time than double point.

 In [37]: timeit Hsig,M,n,z=make_toy_data() 1 loops, best of 3: 2 s per loop In [38]: timeit N = sparse.diags(1. / n ** 2, 0, format='csc') 1 loops, best of 3: 377 ms per loop In [39]: timeit H = -Hsig - z*MTdot(N.dot(M)) 1 loops, best of 3: 1.55 s per loop 

H is

 <2000000x2000000 sparse matrix of type '<type 'numpy.float64'>' with 5999980 stored elements in Compressed Sparse Column format> 
0
source

All Articles