I am trying to create a very huge sparse matrix that has a shape (447957347, 5027974) . And it contains 3289 288 566 elements.
But when I create csr_matrix using scipy.sparse , it returns something like this:
<447957346x5027974 sparse matrix of type '<type 'numpy.uint32'>' with -1005678730 stored elements in Compressed Sparse Row format>
Source code for creating a matrix:
indptr = np.array(a, dtype=np.uint32) # a is a python array('L') contain row index information indices = np.array(b, dtype=np.uint32) # b is a python array('L') contain column index information data = np.ones((len(indices),), dtype=np.uint32) test = csr_matrix((data,indices,indptr), shape=(len(indptr)-1, 5027974), dtype=np.uint32)
And I also found that when I convert a 3 billion python array to a numpy array, this will throw an error:
ValueError:setting an array element with a sequence
But when I create three 1 billion python arrays and convert them to a numpy array, add them. It works great.
I am embarrassed.
source share