The fastest way to create a sparse matrix of the form AT * diag (b) * A + C?

Question

The fastest way to create a sparse matrix of the form AT * diag (b) * A + C?

I am trying to optimize a piece of code that solves a large sparse nonlinear system using the internal point method. During the update phase, this involves calculating the Hessian matrix H , the gradient g , then solving for d in H * d = -g to obtain a new search direction.

The Hessian matrix has a symmetric tridiagonal structure of the form:

AT * diag (b) * A + C

I ran line_profiler for a specific function:

 Line # Hits Time Per Hit % Time Line Contents ================================================== 386 def _direction(n, res, M, Hsig, scale_var, grad_lnprior, z, fac): 387 388 # gradient 389 44 1241715 28220.8 3.7 g = 2 * scale_var * res - grad_lnprior + z * np.dot(MT, 1. / n) 390 391 # hessian 392 44 3103117 70525.4 9.3 N = sparse.diags(1. / n ** 2, 0, format=FMT, dtype=DTYPE) 393 44 18814307 427597.9 56.2 H = - Hsig - z * np.dot(MT, np.dot(N, M)) # slow! 394 395 # update direction 396 44 10329556 234762.6 30.8 d, fac = my_solver(H, -g, fac) 397 398 44 111 2.5 0.0 return d, fac

When looking at the result, it is clear that building H by far the most expensive step - it takes significantly longer than actually solving for a new direction.

Hsig and M are both CSC sparse matrices, n is a dense vector, and z is a scalar. The solver that I use requires H be either a sparse CSC matrix or CSR.

Here is a function that produces some toy data with the same formats, sizes and sparseness as my real matrices:

 import numpy as np from scipy import sparse def make_toy_data(nt=200000, nc=10): d0 = np.random.randn(nc * (nt - 1)) d1 = np.random.randn(nc * (nt - 1)) M = sparse.diags((d0, d1), (0, nc), shape=(nc * (nt - 1), nc * nt), format='csc', dtype=np.float64) d0 = np.random.randn(nc * nt) Hsig = sparse.diags(d0, 0, shape=(nc * nt, nc * nt), format='csc', dtype=np.float64) n = np.random.randn(nc * (nt - 1)) z = np.random.randn() return Hsig, M, n, z

And here is my original approach to building H :

 def original(Hsig, M, n, z): N = sparse.diags(1. / n ** 2, 0, format='csc') H = - Hsig - z * np.dot(MT, np.dot(N, M)) # slow! return H

Timing:

 %timeit original(Hsig, M, n, z) # 1 loops, best of 3: 483 ms per loop

Is there a faster way to build this matrix?

+2

numpy scipy sparse-matrix linear-algebra hessian-matrix

ali_m Apr 17 '14 at 20:55

source share

2 answers

I tried running a test file and had problems with np.dot(N, M) . I did not delve into it, but I think that my numpy / sparse combo (both fairly new) had problems using np.dot on sparse arrays.

But H = -Hsig - z*MTdot(N.dot(M)) works just fine. This uses sparse dot .

I do not run the profile, but here are the Ipython timings for several parts. Data generation takes more time than double point.

 In [37]: timeit Hsig,M,n,z=make_toy_data() 1 loops, best of 3: 2 s per loop In [38]: timeit N = sparse.diags(1. / n ** 2, 0, format='csc') 1 loops, best of 3: 377 ms per loop In [39]: timeit H = -Hsig - z*MTdot(N.dot(M)) 1 loops, best of 3: 1.55 s per loop

H is

 <2000000x2000000 sparse matrix of type '<type 'numpy.float64'>' with 5999980 stored elements in Compressed Sparse Column format>

0

hpaulj Apr 17 '14 at 10:04

source share

Jaime · Accepted Answer · 2014-04-18T05:15:15+0000

I am approaching 4x acceleration when calculating the MT * D * M product from three diagonal arrays. If d0 and d1 are the main and upper diagonals of M , and d is the main diagonal of d , then the following code creates directly MT * D * M :

 def make_tridi_bis(d0, d1, d, nc=10): d00 = d0*d0*d d11 = d1*d1*d d01 = d0*d1*d len_ = d0.size data = np.empty((3*len_ + nc,)) indices = np.empty((3*len_ + nc,), dtype=np.int) # Fill main diagonal data[:2*nc:2] = d00[:nc] indices[:2*nc:2] = np.arange(nc) data[2*nc+1:-2*nc:3] = d00[nc:] + d11[:-nc] indices[2*nc+1:-2*nc:3] = np.arange(nc, len_) data[-2*nc+1::2] = d11[-nc:] indices[-2*nc+1::2] = np.arange(len_, len_ + nc) # Fill top diagonal data[1:2*nc:2] = d01[:nc] indices[1:2*nc:2] = np.arange(nc, 2*nc) data[2*nc+2:-2*nc:3] = d01[nc:] indices[2*nc+2:-2*nc:3] = np.arange(2*nc, len_+nc) # Fill bottom diagonal data[2*nc:-2*nc:3] = d01[:-nc] indices[2*nc:-2*nc:3] = np.arange(len_ - nc) data[-2*nc::2] = d01[-nc:] indices[-2*nc::2] = np.arange(len_ - nc ,len_) indptr = np.empty((len_ + nc + 1,), dtype=np.int) indptr[0] = 0 indptr[1:nc+1] = 2 indptr[nc+1:len_+1] = 3 indptr[-nc:] = 2 np.cumsum(indptr, out=indptr) return sparse.csr_matrix((data, indices, indptr), shape=(len_+nc, len_+nc))

If your matrix M was in CSR format, you can extract d0 and d1 as d0 = M.data[::2] and d1 = M.data[1::2] , I modified your gaming data processing routine to return these arrays, and here is what I get:

 In [90]: np.allclose((MT * sparse.diags(d, 0) * M).A, make_tridi_bis(d0, d1, d).A) Out[90]: True In [92]: %timeit make_tridi_bis(d0, d1, d) 10 loops, best of 3: 124 ms per loop In [93]: %timeit MT * sparse.diags(d, 0) * M 1 loops, best of 3: 501 ms per loop

The whole purpose of the above code is to take advantage of the structure of non-zero entries. If you draw a diagram of the matrices that you multiply together, it is relatively easy to convince yourself that the main ( d_0 ) and upper and lower ( d_1 ) diagonals of the resulting tridiagonal matrix are simple:

 d_0 = np.zeros((len_ + nc,)) d_0[:len_] = d00 d_0[-len_:] += d11 d_1 = d01

The rest of the code in this function simply builds the tridiagonal matrix directly, since calling sparse.diags with the above data is several times slower.

The fastest way to create a sparse matrix of the form AT * diag (b) * A + C?

More articles: