Sparse matrix sorting

Question

Sparse matrix sorting

I have a sparse matrix. I need to sort this matrix row by row and create another [sparse] matrix. The code may explain this better:

# for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d

Output1

 (0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175

I have this matrix m . I want to create a new matrix with a sorted version of m. The new matrix contains the 0th row, similar to this one.

 new_d = new_m.getrow(0) print new_d

Output2

 (0, 1) 0.885898140175 (0, 5) 0.874881629788 (0, 2) 0.504791645463 (0, 4) 0.352559852239

So, I can get which column is larger, etc .:

 print new_d.indices

output3

 array([1, 5, 2, 4])

Of course, each row should be sorted as described above, independently.

I have one solution for this problem, but it is not elegant.

+8

python sorting scipy sparse-matrix

Thorn Apr 04 '12 at 8:26

source share

2 answers

Alexander Measure · Answer 1 · 2013-02-02T22:37:08+0000

If you want to ignore matrix zero-cost elements, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.

 from itertools import izip def sort_coo(m): tuples = izip(m.row, m.col, m.data) return sorted(tuples, key=lambda x: (x[0], x[2]))

For example:

  >>> from numpy.random import rand >>> from scipy.sparse import coo_matrix >>> >>> d = rand(10, 20) >>> d[d > .05] = 0 >>> s = coo_matrix(d) >>> sort_coo(s) [(0, 2, 0.004775589084940246), (3, 12, 0.029941507166614145), (5, 19, 0.015030386789436245), (7, 0, 0.0075044957259399192), (8, 3, 0.047994403933129481), (8, 5, 0.049401058471327031), (9, 15, 0.040011608000125043), (9, 8, 0.048541825332137023)]

Depending on your needs, you can configure sort keys in lambda or continue processing the output. If you want everything in a dictionary with an indexed index you could:

 from collections import defaultdict sorted_rows = defaultdict(list) for i in sort_coo(m): sorted_rows[i[0]].append((i[1], i[2]))

Thorn · Answer 2 · 2012-04-04T09:19:53+0000

My bad solution is this:

 from scipy.sparse import coo_matrix import numpy as np a = [] for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) n = len(d.indices) s = zip([i]*n, d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[2], reverse=True) a.extend(sorted_s) a = np.array(a) new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape)

There may be some simple errors, because I have not tested them yet. But the idea, in my opinion, is intuitive. Is there a good solution?

Edit

This new matrix creation can be useless because if you call the getrow method then the order is broken again. Only coo_matrix.col keeps order.

Another solution

This is not an exact solution, but may be useful:

 def sortSparseMatrix(m, rev=True, only_indices=True): """ Sort a sparse matrix and return column index dictionary """ col_dict = dict() for i in xrange(m.shape[0]): # assume m is square matrix. d = m.getrow(i) s = zip(d.indices, d.data) sorted_s = sorted(s, key=lambda v: v[1], reverse=True) if only_indices: col_dict[i] = [element[0] for element in sorted_s] else: col_dict[i] = sorted_s return col_dict

 >>> print sortSparseMatrix(m) {0: [5, 1, 0], 1: [1, 3, 5], 2: [1, 2, 3, 4], 3: [1, 5, 2, 4], 4: [0, 3, 5, 1], 5: [3, 4, 2]}

Sparse matrix sorting

Output1

Output2

output3

Edit

Another solution

More articles: