Error converting large sparse matrix to COO

I ran into the following problem while trying to expose two large CSR matrices:

/usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self) 229 raise ValueError('negative row index found') 230 if self.col.min() < 0: --> 231 raise ValueError('negative column index found') 232 233 def transpose(self, copy=False): ValueError: negative column index found 

I can reproduce this error very simply by trying to convert a large lil matrix to a coo matrix. The following code works for N = 10 ** 9, but not for N = 10 ** 10.

 from scipy import sparse from numpy import random N=10**10 x = sparse.lil_matrix( (1,N) ) for _ in xrange(1000): x[0,random.randint(0,N-1)]=random.randint(1,100) y = sparse.coo_matrix(x) 

Is there a size limit that I click for coo matrices? Is there any way around this?

+7
python numpy scipy
source share
2 answers

Interestingly, your second example works well with my setup.

The "negative column index found" error message looks like an overflow. I checked a new source with the following results:

  • The actual index data type is computed in scipy.sparse.sputils.get_index_dtype
  • The error message comes with the scipy.sparse.coo module

An exception comes from this type of code:

  idx_dtype = get_index_dtype(maxval=max(self.shape)) self.row = np.asarray(self.row, dtype=idx_dtype) self.col = np.asarray(self.col, dtype=idx_dtype) self.data = to_native(self.data) if nnz > 0: if self.row.max() >= self.shape[0]: raise ValueError('row index exceeds matrix dimensions') if self.col.max() >= self.shape[1]: raise ValueError('column index exceeds matrix dimensions') if self.row.min() < 0: raise ValueError('negative row index found') if self.col.min() < 0: raise ValueError('negative column index found') 

This is a clear overflow error with - possibly - 2 ** 31.

If you want to debug it, try:

 import scipy.sparse.sputils import numpy as np scipy.sparse.sputils.get_index_dtype((np.array(10**10),)) 

It should return int64 . IF this is not a problem.

What version of SciPy?

+5
source share

It looks like you are falling within the limits of 32-bit integers. Here is a quick test:

 In [14]: np.array([10**9, 10**10], dtype=np.int64) Out[14]: array([ 1000000000, 10000000000]) In [15]: np.array([10**9, 10**10], dtype=np.int32) Out[15]: array([1000000000, 1410065408], dtype=int32) 

Currently, most sparse matrix representations assume 32-bit integer indices, so they simply cannot support large arrays.

EDIT . Starting with version 0.14, scipy now supports 64-bit indexing. If you can upgrade, this problem will go away.

+6
source share

All Articles