Creating an intermediate dok matrix works in your example:
In [410]: c=sparse.coo_matrix((data, (cols, rows)),shape=(3,3)).todok().tocsc() In [411]: cA Out[411]: array([[0, 0, 0], [0, 4, 0], [0, 0, 0]], dtype=int32)
Matrix
A coo puts your input arrays in the data , col , row attributes unchanged. Summation does not occur until it is converted to csc .
todok loads the dictionary directly from coo attributes. It creates an empty dok matrix and fills it:
dok.update(izip(izip(self.row,self.col),self.data))
So, if there are duplicate values (row,col) , this is the last one that remains. This uses standard Python dictionary hashing to find unique keys.
You can use np.unique . I had to build a special array of objects, because unique works on 1d, and we index 2d.
In [479]: data, cols, rows = [np.array(j) for j in [[1,4,2,4,1],[0,1,1,1,2],[0,1,2,1,1]]] In [480]: x=np.zeros(cols.shape,dtype=object) In [481]: x[:]=list(zip(rows,cols)) In [482]: x Out[482]: array([(0, 0), (1, 1), (2, 1), (1, 1), (1, 2)], dtype=object) In [483]: i=np.unique(x,return_index=True)[1] In [484]: i Out[484]: array([0, 1, 4, 2], dtype=int32) In [485]: c1=sparse.csc_matrix((data[i],(cols[i],rows[i])),shape=(3,3)) In [486]: c1.A Out[486]: array([[1, 0, 0], [0, 4, 2], [0, 1, 0]], dtype=int32)
I have no idea which approach is faster.
An alternative way to get a unique index at liuengo's link:
rc = np.vstack([rows,cols]).T.copy() dt = rc.dtype.descr * 2 i = np.unique(rc.view(dt), return_index=True)[1]
rc must have its own data to change the dtype with the view, therefore .T.copy() .
In [554]: rc.view(dt) Out[554]: array([[(0, 0)], [(1, 1)], [(2, 1)], [(1, 1)], [(1, 2)]], dtype=[('f0', '<i4'), ('f1', '<i4')])