Sparse matrix sorting

I have a sparse matrix. I need to sort this matrix row by row and create another [sparse] matrix. The code may explain this better:

# for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d 

Output1

 (0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175 

I have this matrix m . I want to create a new matrix with a sorted version of m. The new matrix contains the 0th row, similar to this one.

 new_d = new_m.getrow(0) print new_d 

Output2

 (0, 1) 0.885898140175 (0, 5) 0.874881629788 (0, 2) 0.504791645463 (0, 4) 0.352559852239 

So, I can get which column is larger, etc .:

 print new_d.indices 

output3

 array([1, 5, 2, 4]) 

Of course, each row should be sorted as described above, independently.

I have one solution for this problem, but it is not elegant.

+8
python sorting scipy sparse-matrix
source share
2 answers

If you want to ignore matrix zero-cost elements, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.

 from itertools import izip def sort_coo(m): tuples = izip(m.row, m.col, m.data) return sorted(tuples, key=lambda x: (x[0], x[2])) 

For example:

  >>> from numpy.random import rand >>> from scipy.sparse import coo_matrix >>> >>> d = rand(10, 20) >>> d[d > .05] = 0 >>> s = coo_matrix(d) >>> sort_coo(s) [(0, 2, 0.004775589084940246), (3, 12, 0.029941507166614145), (5, 19, 0.015030386789436245), (7, 0, 0.0075044957259399192), (8, 3, 0.047994403933129481), (8, 5, 0.049401058471327031), (9, 15, 0.040011608000125043), (9, 8, 0.048541825332137023)] 

Depending on your needs, you can configure sort keys in lambda or continue processing the output. If you want everything in a dictionary with an indexed index you could:

 from collections import defaultdict sorted_rows = defaultdict(list) for i in sort_coo(m): sorted_rows[i[0]].append((i[1], i[2])) 
+6
source share

My bad solution is this:

 from scipy.sparse import coo_matrix import numpy as np a = [] for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) n = len(d.indices) s = zip([i]*n, d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[2], reverse=True) a.extend(sorted_s) a = np.array(a) new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape) 

There may be some simple errors, because I have not tested them yet. But the idea, in my opinion, is intuitive. Is there a good solution?

Edit

This new matrix creation can be useless because if you call the getrow method then the order is broken again. Only coo_matrix.col keeps order.

Another solution

This is not an exact solution, but may be useful:

 def sortSparseMatrix(m, rev=True, only_indices=True): """ Sort a sparse matrix and return column index dictionary """ col_dict = dict() for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) s = zip(d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[1], reverse=True) if only_indices: col_dict[i] = [element[0] for element in sorted_s] else: col_dict[i] = sorted_s return col_dict 

 >>> print sortSparseMatrix(m) {0: [5, 1, 0], 1: [1, 3, 5], 2: [1, 2, 3, 4], 3: [1, 5, 2, 4], 4: [0, 3, 5, 1], 5: [3, 4, 2]} 
+1
source share

All Articles