I have a collection of O (N) NxN scipy.sparse.csr_matrix , and each sparse matrix has an order of N elements. I want to add all these matrices together to get a regular NxN numpy array. (N of the order of 1000). The arrangement of nonzero elements inside the matrices is such that the resulting sum is, of course, not sparse (in fact, there are practically no zero elements).
Right now I'm just doing
reduce(lambda x,y: x+y,[m.toarray() for m in my_sparse_matrices])
which works, but a little slow: of course, the huge amount of pointless processing of zeros that happens there is absolutely terrifying.
Is there a better way? There is nothing obvious to me in docs .
Update: at the suggestion of user545424, I tried an alternative scheme for summing sparse matrices, and also summarized sparse matrices on a dense matrix. The code below shows all approaches to running in comparable time (Python 2.6.6 on amd64 Debian / Squeeze on a quad-core i7 processor)
import numpy as np import numpy.random import scipy import scipy.sparse import time N=768 S=768 D=3 def mkrandomsparse(): m=np.zeros((S,S),dtype=np.float32) r=np.random.random_integers(0,S-1,D*S) c=np.random.random_integers(0,S-1,D*S) for e in zip(r,c): m[e[0],e[1]]=1.0 return scipy.sparse.csr_matrix(m) M=[mkrandomsparse() for i in xrange(N)] def plus_dense(): return reduce(lambda x,y: x+y,[m.toarray() for m in M]) def plus_sparse(): return reduce(lambda x,y: x+y,M).toarray() def sum_dense(): return sum([m.toarray() for m in M]) def sum_sparse(): return sum(M[1:],M[0]).toarray() def sum_combo():
and logs out
plus_dense : 1.368s plus_sparse : 1.405s sum_dense : 1.368s sum_sparse : 1.406s sum_combo : 1.039s
although you can get one or the other approach to go ahead 2 times or so, messing around with the parameters N, S, D ... but it doesnโt seem like an improvement in the order that you hope to see given that the number zero is added, you should skip .