Multiplication of large sparse matrices in python

I would like to multiply two large sparse matrices. The first is 150,000 × 300,000, and the second is 300,000 × 300,000. The first matrix contains about 1,000,000 non-zero elements, and the second matrix contains about 20,000,000 non-zero elements. Is there an easy way to get the product of these matrices?

I am currently storing matrices in csr or csc format and trying matrix_a * matrix_b . This gives a ValueError: array is too big error.

I suggest that I could store individual matrices on disk using pytables, split them into smaller blocks and construct the final matrix product from the products of many blocks. But I hope for something relatively simple embodiment.

EDIT: I hope for a solution that works for arbitrarily large sparse matrices, while hiding (or avoiding) bookkeeping, participating in moving individual blocks back and forth between memory and disk.

+7
source share
1 answer

Strange, because the following worked for me:

 import scipy.sparse mat1 = scipy.sparse.rand(150e3, 300e3, density=1e6/150e3/300e3) mat2 = scipy.sparse.rand(300e3, 300e3, density=20e6/150e3/300e3) cmat1 = scipy.sparse.csc_matrix(mat1) cmat2 = scipy.sparse.csc_matrix(mat2) res = cmat1 * cmat2 

I am using the latest scipy. And the amount of RAM used by python was ~ 3 GB

So maybe your matrices are such that their product is not very sparse?

+6
source

All Articles