Filter values from scipy sparse matrix

Question

Filter values from scipy sparse matrix

I am trying to filter values less than 10 from a huge (1Mx1M) CSR matrix (SciPy). Since all my values are integers, dividing by 10 and multiplying by 10 does the job, but I was wondering if there was a better way to filter the elements.

EDIT: Below is the answer. Make sure you have the latest version of SciPy installed.

+6

python scipy sparse-matrix

Omer Feb 27 '14 at 16:15

source share

2 answers

I think the version issue is related to the implementation of comparison operators. m >= 0 , m.__gt__ . (I do not have an earlier version of scipy to test this, but I believe that there is one or more SO threads in this thread).

Something that might work in an earlier version:

 m.data *= m.data>=10 m.eliminate_zeros()

In other words, use the standard numpy operation to set the selected values to 0. The test can be much more complicated. Then use the standard sparse function to clear it. When you say “filter”, which, in essence, you want to do, isn’t it: set some values to zero and remove them from the sparse matrix?

+1

hpaulj May 04 '14 at 17:44

source share

Jaime · Accepted Answer · 2014-02-27T18:46:38+0000

You can also go with a less hacked but probably slower:

m = m.multiply(m >= 10)

To understand what is going on:

 >>> m = scipy.sparse.csr_matrix((1000, 1000), dtype=np.int) >>> m[np.random.randint(0, 1000, 20), np.random.randint(0, 1000, 20)] = np.random.randint(0, 100, 20) >>> m.data array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99, 9, 3, 84]) >>> m >= 10 <1000x1000 sparse matrix of type '<type 'numpy.bool_'>' with 18 stored elements in Compressed Sparse Row format> >>> m = m.multiply(m >= 10) >>> m <1000x1000 sparse matrix of type '<type 'numpy.int32'>' with 18 stored elements in Compressed Sparse Row format> >>> m.data array([92, 46, 99, 24, 75, 16, 49, 60, 87, 64, 91, 37, 30, 32, 25, 40, 99, 84])

Filter values ​​from scipy sparse matrix

More articles:

Filter values from scipy sparse matrix