How to go to a subsample from scipy.sparse.csr.csr_matrix and a list

I have a scipy.sparse.csr.csr_matrix that represents words in a document and a list of lists, where each index represents categories for each index in the matrix.

The problem I am facing is that I need to randomly select N number of rows from the data.

So, if my matrix looks like this

 [1:3 2:3 4:4] [1:5 2:5 5:4] 

and my list of lists looked like this:

 ((20,40) (80,50)) 

and I needed to try 1 value, I could end up with this

 [1:3 2:3 4:4] ((20,40)) 

I was looking for scipy documentation, but can't find a way to generate a new csr matrix using a list of indexes.

+2
source share
1 answer

You can simply index the csr matrix using a list of indexes. First we create a matrix and look at it:

 >>> m = csr_matrix([[0,0,1,0], [4,3,0,0], [3,0,0,8]]) <3x4 sparse matrix of type '<type 'numpy.int64'>' with 5 stored elements in Compressed Sparse Row format> >>> print m.toarray() [[0 0 1 0] [4 3 0 0] [3 0 0 8]] 

Of course, we can just just take a look at the first line:

 >>> m[0] <1x4 sparse matrix of type '<type 'numpy.int64'>' with 1 stored elements in Compressed Sparse Row format> >>> print m[0].toarray() [[0 0 1 0]] 

But we can also look at the first and third line at once, using the list [0,2] as an index:

 >>> m[[0,2]] <2x4 sparse matrix of type '<type 'numpy.int64'>' with 3 stored elements in Compressed Sparse Row format> >>> print m[[0,2]].toarray() [[0 0 1 0] [3 0 0 8]] 

Now you can generate random N indices without repeating (without replacing) with numpy choice :

 i = np.random.choice(np.arange(m.shape[0]), N, replace=False) 

Then you can grab these indices from the original matrix m :

 sub_m = m[i] 

To grab them from a list of category lists, you must first create an array, then you can index list i :

 sub_c = np.asarray(categories)[i] 

If you want to have a list of lists, just use:

 sub_c.tolist() 

or, if you really have / want a tuple of tuples, I think you need to do this manually:

 tuple(map(tuple, sub_c)) 
+3
source

All Articles