Scipy: sparse matrix giving invalid values

Below is my code for creating my sparse matrix:

import numpy as np import scipy def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_, col = np.unique(Y, return_inverse=True) return scipy.sparse.csr_matrix( (Z.flat,(row,col)), shape=(x_.size, y_.size) ) >>> print sparsemaker(A, B, C) #A, B, and C are (220, 256) sized arrays. (0, 0) 167064.269831 (0, 2) 56.6146564629 (0, 9) 53.8660340698 (0, 23) 80.6529717039 (0, 28) 0.0 (0, 33) 53.2379218326 (0, 40) 54.3868995375 : : 

Now my input arrays are a little big, so I donโ€™t know how to publish them here (if someone has no ideas); but even looking at the first value, I can already say that something is wrong:

 >>> test = sparsemaker(A, B, C) >>> np.max(test.toarray()) 167064.26983076424 >>> np.where(C==np.max(test.toarray())) (array([], dtype=int64), array([], dtype=int64)) 

Does anyone know why this will happen? Where did this meaning come from?

+1
numpy scipy sparse-matrix
Jan 22 '13 at 22:20
source share
1 answer

You have duplicate coordinates, and the constructor adds them all. Follow these steps:

 x_, row = np.unique(X, return_inverse=True) y_, col = np.unique(Y, return_inverse=True) print Z.flat[(row == 0) & (col == 0)].sum() 

and you should get this mysterious 167064.26983076424 .

EDIT The ugly code that follows works fine with small examples when averaging duplicate entries, with some code borrowed from this other question , this is an attempt:

 def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_, col = np.unique(Y, return_inverse=True) indices = np.array(zip(row, col)) _, repeats = np.unique(indices.view([('', indices.dtype)]*2), return_inverse=True) counts = 1. / np.bincount(repeats) factor = counts[repeats] return scipy.sparse.csr_matrix((Z.flat * factor,(row,col)), shape=(x_.size, y_.size)) 
+3
Jan 22 '13 at 22:53
source share



All Articles