Effective way to get the maximum of each row for a large sparse matrix

Question

Effective way to get the maximum of each row for a large sparse matrix

I have a large sparse matrix, and I want to get the maximum value for each row. In numpy, I can call numpy.max (mat, axis = 1), but I cannot find a similar function for a scipy sparse matrix. Is there an efficient way to get the maximum of each row for a large sparse matrix?

+6

python scipy sparse-matrix

hanqiang Apr 13 '13 at 20:58

source share

2 answers

Jaime · Answer 1 · 2013-04-15T06:21:34+0000

If your matrix, calling it a , is saved in CSR format, then a.data has all nonzero entries sorted by rows, and a.indptr has the index of the first element of each row. You can use this to calculate what you are doing as follows:

 def sparse_max_row(csr_mat): ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1]) ret[np.diff(csr_mat.indptr) == 0] = 0 return ret

Jakem · Answer 2 · 2013-06-26T22:28:16+0000

I just stumbled upon the same problem. Jaime’s solution breaks if any of the rows in the matrix is completely empty. Here is a workaround:

 def sparse_max_row(csr_mat): ret = np.zeros(csr_mat.shape[0]) ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0]) return ret

Effective way to get the maximum of each row for a large sparse matrix

More articles: