Filter values โ€‹โ€‹from scipy sparse matrix

I am trying to filter values โ€‹โ€‹less than 10 from a huge (1Mx1M) CSR matrix (SciPy). Since all my values โ€‹โ€‹are integers, dividing by 10 and multiplying by 10 does the job, but I was wondering if there was a better way to filter the elements.

EDIT: Below is the answer. Make sure you have the latest version of SciPy installed.

+6
source share
2 answers

You can also go with a less hacked but probably slower:

m = m.multiply(m >= 10) 

To understand what is going on:

 >>> m = scipy.sparse.csr_matrix((1000, 1000), dtype=np.int) >>> m[np.random.randint(0, 1000, 20), np.random.randint(0, 1000, 20)] = np.random.randint(0, 100, 20) >>> m.data array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99, 9, 3, 84]) >>> m >= 10 <1000x1000 sparse matrix of type '<type 'numpy.bool_'>' with 18 stored elements in Compressed Sparse Row format> >>> m = m.multiply(m >= 10) >>> m <1000x1000 sparse matrix of type '<type 'numpy.int32'>' with 18 stored elements in Compressed Sparse Row format> >>> m.data array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99, 84]) 
+5
source

I think the version issue is related to the implementation of comparison operators. m >= 0 , m.__gt__ . (I do not have an earlier version of scipy to test this, but I believe that there is one or more SO threads in this thread).

Something that might work in an earlier version:

 m.data *= m.data>=10 m.eliminate_zeros() 

In other words, use the standard numpy operation to set the selected values โ€‹โ€‹to 0. The test can be much more complicated. Then use the standard sparse function to clear it. When you say โ€œfilterโ€, which, in essence, you want to do, isnโ€™t it: set some values โ€‹โ€‹to zero and remove them from the sparse matrix?

+1
source

All Articles