Effective way to get the maximum of each row for a large sparse matrix

I have a large sparse matrix, and I want to get the maximum value for each row. In numpy, I can call numpy.max (mat, axis = 1), but I cannot find a similar function for a scipy sparse matrix. Is there an efficient way to get the maximum of each row for a large sparse matrix?

+6
source share
2 answers

If your matrix, calling it a , is saved in CSR format, then a.data has all nonzero entries sorted by rows, and a.indptr has the index of the first element of each row. You can use this to calculate what you are doing as follows:

 def sparse_max_row(csr_mat): ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1]) ret[np.diff(csr_mat.indptr) == 0] = 0 return ret 
+4
source

I just stumbled upon the same problem. Jaime’s solution breaks if any of the rows in the matrix is ​​completely empty. Here is a workaround:

 def sparse_max_row(csr_mat): ret = np.zeros(csr_mat.shape[0]) ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0]) return ret 
+2
source

All Articles