The point product of two sparse matrices that affect only zero values

Question

The point product of two sparse matrices that affect only zero values

I am trying to calculate a simple point product, but leave the nonzero values from the original matrix unchanged. Toy example:

import numpy as np A = np.array([[2, 1, 1, 2], [0, 2, 1, 0], [1, 0, 1, 1], [2, 2, 1, 0]]) B = np.array([[ 0.54331039, 0.41018682, 0.1582158 , 0.3486124 ], [ 0.68804647, 0.29520239, 0.40654206, 0.20473451], [ 0.69857579, 0.38958572, 0.30361365, 0.32256483], [ 0.46195299, 0.79863505, 0.22431876, 0.59054473]])

Desired Result:

 C = np.array([[ 2. , 1. , 1. , 2. ], [ 2.07466874, 2. , 1. , 0.73203386], [ 1. , 1.5984076 , 1. , 1. ], [ 2. , 2. , 1. , 1.42925865]])

The actual matrices in question, however, are sparse and more like this:

 A = sparse.rand(250000, 1700, density=0.001, format='csr') B = sparse.rand(1700, 1700, density=0.02, format='csr')

One easy way is to simply set the values using the mask index, for example:

 mask = A != 0 C = A.dot(B) C[mask] = A[mask]

However, my original arrays are sparse and quite large, so changing them using index assignment is very slow. Converting to lil matrix helps a bit, but then again, converting takes a lot of time.

Another obvious approach, I think, would be to just resort to iteration and skip masked values, but I would not want to throw away the advantages of multiplying the numpy / scipy-optimized array.

Some explanations: I am really interested in some special case where B always a square, and therefore A and C have the same shape. Therefore, if there is a solution that does not work on arbitrary arrays, but is suitable for my case, this is fine.

UPDATE: Some attempts:

 from scipy import sparse import numpy as np def naive(A, B): mask = A != 0 out = A.dot(B).tolil() out[mask] = A[mask] return out.tocsr() def proposed(A, B): Az = A == 0 R, C = np.where(Az) out = A.copy() out[Az] = np.einsum('ij,ji->i', A[R], B[:, C]) return out %timeit naive(A, B) 1 loops, best of 3: 4.04 s per loop %timeit proposed(A, B) /usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py:215: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead. /usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in __init__(self, arg1, shape, dtype, copy) 173 self.shape = M.shape 174 --> 175 self.row, self.col = M.nonzero() 176 self.data = M[self.row, self.col] 177 self.has_canonical_format = True MemoryError:

OTHER UPDATE:

Can't do anything more or less useful from Cython, at least not going too far from Python. The idea was to leave the point product in scipy and just try to set these initial values as quickly as possible, something like this:

 cimport cython @cython.cdivision(True) @cython.boundscheck(False) @cython.wraparound(False) cpdef coo_replace(int [:] row1, int [:] col1, float [:] data1, int[:] row2, int[:] col2, float[:] data2): cdef int N = row1.shape[0] cdef int M = row2.shape[0] cdef int i, j cdef dict d = {} for i in range(M): d[(row2[i], col2[i])] = data2[i] for j in range(N): if (row1[j], col1[j]) in d: data1[j] = d[(row1[j], col1[j])]

It was a little better than my preliminary “naive” implementation (using .tolil() ), but after the approach hpaulj lil can be thrown away. Perhaps replacing the python dict with something like std :: map will help.

+6

python numpy scipy matrix sparse-matrix

rocknrollnerd Jul 17 '16 at 8:24

source share

3 answers

Hacked it! Well, there are a lot of meager things that are characteristic of the rare matrices that I learned along the way. Here's an implementation I could put together -

 # Find the indices in output array that are to be updated R,C = ((A!=0).dot(B!=0)).nonzero() mask = np.asarray(A[R,C]==0).ravel() R,C = R[mask],C[mask] # Make a copy of A and get the dot product through sliced rows and columns # off A and B using the definition of matrix-multiplication out = A.copy() out[R,C] = (A[R].multiply(B[:,C].T).sum(1)).ravel()

The most expensive part is the multiplication and summation over the elements. On some quick time tests, it seems like it would be good on sparse matrices with a high degree of sparseness to beat the original point-based mask solution in terms of performance, which, in my opinion, is related to its memory efficiency.

Runtime test

Function Definitions -

 def naive(A, B): mask = A != 0 out = A.dot(B).tolil() out[mask] = A[mask] return out.tocsr() def proposed(A, B): R,C = ((A!=0).dot(B!=0)).nonzero() mask = np.asarray(A[R,C]==0).ravel() R,C = R[mask],C[mask] out = A.copy() out[R,C] = (A[R].multiply(B[:,C].T).sum(1)).ravel() return out

Dates -

 In [57]: # Input matrices ...: M,N = 25000, 170 ...: A = sparse.rand(M, N, density=0.001, format='csr') ...: B = sparse.rand(N, N, density=0.02, format='csr') ...: In [58]: %timeit naive(A, B) 10 loops, best of 3: 92.2 ms per loop In [59]: %timeit proposed(A, B) 10 loops, best of 3: 132 ms per loop In [60]: # Input matrices with increased sparse-ness ...: M,N = 25000, 170 ...: A = sparse.rand(M, N, density=0.0001, format='csr') ...: B = sparse.rand(N, N, density=0.002, format='csr') ...: In [61]: %timeit naive(A, B) 10 loops, best of 3: 78.1 ms per loop In [62]: %timeit proposed(A, B) 100 loops, best of 3: 8.03 ms per loop

+3

Divakar Jul 17 '16 at 9:20

source share

Python is not my main language, but I thought it was an interesting problem and I would like to hit it :)

Qualifying:

 import numpy import scipy.sparse # example matrices and sparse versions A = numpy.array([[1, 2, 0, 1], [1, 0, 1, 2], [0, 1, 2 ,1], [1, 2, 1, 0]]) B = numpy.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]) A_s = scipy.sparse.lil_matrix(A) B_s = scipy.sparse.lil_matrix(B)

So you want to convert the original problem:

 C = A.dot(B) C[A.nonzero()] = A[A.nonzero()]

To something rare. To put this aside, the direct "sparse" translation above:

 C_s = A_s.dot(B_s) C_s[A_s.nonzero()] = A_s[A_s.nonzero()]

But it looks like you are unhappy with this, since it first calculates all the point products that you are worried about can be inefficient.

So your question is: if you first find zeros and evaluate only point products on these elements, will it be faster? That is, for a dense matrix, it could be something like this:

 Xs, Ys = numpy.nonzero(A==0) D = A[:] D[Xs, Ys] = map ( lambda x,y: A[x,:].dot(B[:,y]), Xs, Ys)

Translate this into a sparse matrix. The main stumbling block here was the search for zero indexes; since A_s==0 does not make sense for sparse matrices, I found them this way:

 Xmax, Ymax = A_s.shape DenseSize = Xmax * Ymax Xgrid, Ygrid = numpy.mgrid[0:Xmax, 0:Ymax] Ygrid = Ygrid.reshape([DenseSize,1])[:,0] Xgrid = Xgrid.reshape([DenseSize,1])[:,0] AllIndices = numpy.array([Xgrid, Ygrid]) NonzeroIndices = numpy.array(A_s.nonzero()) ZeroIndices = numpy.array([x for x in AllIndices.T.tolist() if x not in NonzeroIndices.T.tolist()]).T

If you know a better / faster way, be sure to give it a try. Once we have the Zero indices, we can do the same mapping as before:

 D_s = A_s[:] D_s[ZeroIndices[0], ZeroIndices[1]] = map ( lambda x, y : A_s[x,:].dot(B[:,y])[0], ZeroIndices[0], ZeroIndices[1] )

which gives you your sparse matrix result.

Now I do not know if it is faster or not. I basically took a hit because it was an interesting problem and see if I can do this in python. In fact, I suspect that this may not be faster than a direct whole-matrix dotproduct, because it uses listcomprehensions and mapping on a large dataset (as you say, you expect a lot of zeros). But this is the answer to your question: "How can I only calculate point products for zero values without matrix multiplication as a whole." I would be interested to see if you are trying to compare this in terms of speed on your datasets.

EDIT: I put below an example of “block processing” based on the above, which, I think, should allow you to process your large data set without any problems. Let me know if this works.

 import numpy import scipy.sparse # example matrices and sparse versions A = numpy.array([[1, 2, 0, 1], [1, 0, 1, 2], [0, 1, 2 ,1], [1, 2, 1, 0]]) B = numpy.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]) A_s = scipy.sparse.lil_matrix(A) B_s = scipy.sparse.lil_matrix(B) # Choose a grid division (ie how many processing blocks you want to create) BlockGrid = numpy.array([2,2]) D_s = A_s[:] # initialise from A Xmax, Ymax = A_s.shape BaseBSiz = numpy.array([Xmax, Ymax]) / BlockGrid for BIndX in range(0, Xmax, BlockGrid[0]): for BIndY in range(0, Ymax, BlockGrid[1]): BSizX, BSizY = D_s[ BIndX : BIndX + BaseBSiz[0], BIndY : BIndY + BaseBSiz[1] ].shape Xgrid, Ygrid = numpy.mgrid[BIndX : BIndX + BSizX, BIndY : BIndY + BSizY] Xgrid = Xgrid.reshape([BSizX*BSizY,1])[:,0] Ygrid = Ygrid.reshape([BSizX*BSizY,1])[:,0] AllInd = numpy.array([Xgrid, Ygrid]).T NZeroInd = numpy.array(A_s[Xgrid, Ygrid].reshape((BSizX,BSizY)).nonzero()).T + numpy.array([[BIndX],[BIndY]]).T ZeroInd = numpy.array([x for x in AllInd.tolist() if x not in NZeroInd.tolist()]).T # # Replace zero-values in current block D_s[ZeroInd[0], ZeroInd[1]] = map ( lambda x, y : A_s[x,:].dot(B[:,y])[0], ZeroInd[0], ZeroInd[1] )

0

Tasos papastylianou Jul 17 '16 at 19:41

source share

hpaulj · Accepted Answer · 2016-07-17T17:09:12+0000

Perhaps a cleaner and faster version of your naive code:

 In [57]: r,c=A.nonzero() # this uses A.tocoo() In [58]: C=A*B In [59]: Cl=C.tolil() In [60]: Cl[r,c]=A.tolil()[r,c] In [61]: Cl.tocsr()

C[r,c]=A[r,c] gives a warning about efficiency, but I think more people make this assignment in a loop.

 In [63]: %%timeit C=A*B ...: C[r,c]=A[r,c] ... The slowest run took 7.32 times longer than the fastest.... 1000 loops, best of 3: 334 µs per loop In [64]: %%timeit C=A*B ...: Cl=C.tolil() ...: Cl[r,c]=A.tolil()[r,c] ...: Cl.tocsr() ...: 100 loops, best of 3: 2.83 ms per loop

My A small, only (250,100), but it seems like traveling around the world until lil not a time keeper, despite the warning.

Disguise with A==0 is related to problems when A sparse

 In [66]: Az=A==0 ....SparseEfficiencyWarning... In [67]: r1,c1=Az.nonzero()

Compared to nonzero r for A , this r1 much larger - the row index of all zeros in the sparse matrix; all but 25 nonzero.

 In [70]: r.shape Out[70]: (25,) In [71]: r1.shape Out[71]: (24975,)

If I index A using this r1 , I get a much larger array. In essence, I repeat each line by the number of zeros in it

 In [72]: A[r1,:] Out[72]: <24975x100 sparse matrix of type '<class 'numpy.float64'>' with 2473 stored elements in Compressed Sparse Row format> In [73]: A Out[73]: <250x100 sparse matrix of type '<class 'numpy.float64'>' with 25 stored elements in Compressed Sparse Row format>

I increased the shape and number of nonzero elements by about 100 (number of columns).

Defining foo and copying Divakar tests:

 def foo(A,B): r,c = A.nonzero() C = A*B C[r,c] = A[r,c] return C In [83]: timeit naive(A,B) 100 loops, best of 3: 2.53 ms per loop In [84]: timeit proposed(A,B) /... SparseEfficiencyWarning) 100 loops, best of 3: 4.48 ms per loop In [85]: timeit foo(A,B) ... SparseEfficiencyWarning) 100 loops, best of 3: 2.13 ms per loop

So my version has a modest speed. As Divacar found out, changing sparseness changes relative advantages. I expect that size will also change them.

The fact that A.nonzero uses the coo format suggests that it would be possible to build a new array with this format. A lot of sparse code creates a new matrix using coo values.

 In [97]: Co=C.tocoo() In [98]: Ao=A.tocoo() In [99]: r=np.concatenate((Co.row,Ao.row)) In [100]: c=np.concatenate((Co.col,Ao.col)) In [101]: d=np.concatenate((Co.data,Ao.data)) In [102]: r.shape Out[102]: (79,) In [103]: C1=sparse.csr_matrix((d,(r,c)),shape=A.shape) In [104]: C1 Out[104]: <250x100 sparse matrix of type '<class 'numpy.float64'>' with 78 stored elements in Compressed Sparse Row format>

This C1 has, I think, the same nonzero elements as C , constructed in other ways. But I think one value is different than r longer. In this particular example, C and A use the same non-zero element, and the coo input style summarizes those where, as we would rather have A , overwrite everything.

If you can tolerate this mismatch, this is a faster way (at least for this test case):

 def bar(A,B): C=A*B Co=C.tocoo() Ao=A.tocoo() r=np.concatenate((Co.row,Ao.row)) c=np.concatenate((Co.col,Ao.col)) d=np.concatenate((Co.data,Ao.data)) return sparse.csr_matrix((d,(r,c)),shape=A.shape) In [107]: timeit bar(A,B) 1000 loops, best of 3: 1.03 ms per loop

The point product of two sparse matrices that affect only zero values

More articles: