I use Scipy to build a large, sparse (250k X 250k) match matrix using scipy.sparse.lil_matrix . Associated matrices are triangular; that is, M [i, j] == M [j, i]. Since it would be extremely inefficient (and in my case impossible) to store all the data twice, I currently store the data in the coordinate (i, j), where I am always less than j. In other words, I have a value stored in (2,3), and no value is stored in (3,2), although (3,2) in my model should be equal to (2,3). (See below matrix)
My problem is that I need to be able to randomly retrieve data that matches the given index, but at least the way I do it, half of the data is in a row and half in a column, like this:
M = [1 2 3 4 0 5 6 7 0 0 8 9 0 0 0 10]
So, given the matrix above, I want to be able to make a request of type M[1] and return [2,5,6,7] . I have two questions:
1) Is there a more efficient (preferably built-in) way to do this than query the row first, then the column, and then combine the two? This is bad because I use the internal representation of CSC (column based) or CSR (row based), one of the two queries is extremely inefficient.
2) Do I even use the right side of Scipy? I saw several functions in the Scipy library that mention triangular matrices, but they seem to revolve around getting triangular matrices from a full matrix. In my case (I think) I already have a triangular matrix and you want to manipulate it.
Many thanks.
python scipy matrix
gilesc
source share