Pseudo-inverse of sparse matrix in python

I work with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).

In particular, I will need to calculate the pseudoinverse of my matrix to solve the least square problem. I found the sparse.lsqr method, but it is not very efficient. Is there a way to calculate the Moore-Penrose pseudoinverse (corresponding pinv for normal matrices).

The size of my matrix A is about 600'000x2000, and in each row of the matrix I will have from 0 to 4 non-zero values. The size of matrix A is given by a bundle of voxel x fibers (white fiber fibers), and we expect a maximum of 4 paths intersect in the voxel. In most cases of white voxels, we expect to have at least 1 treatise, but I will say that about 20% of the lines can be zeros.

Vector b should not be sparse, in fact b contains a measure for each voxel, which is not equal to zero at all.

I need to minimize the error, but there are also some conditions for the vector x. When I tried the model on smaller matrices, I never had to limit the system to satisfy these conditions (generally speaking, 0

Is this some kind of help? Is there a way to avoid using pseudo-inverse A?

thanks

June 1 Update: Thanks again for your help. I can not show you anything about my data, because the code in python gives me some problems. However, in order to understand how I can choose a good k, I tried to create a testing function in Matlab.

The code is as follows:

F=zeros(100000,1000); for k=1:150000 p=rand(1); a=0; b=0; while a<=0 || b<=0 a=random('Binomial',100000,p); b=random('Binomial',1000,p); end F(a,b)=rand(1); end solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100); size(solution) solution=solution'; measure=F*solution; %check=pinvF*measure; k=250; F=sparse(F); [U,S,V]=svds(F,k); s=svds(F,k); plot(s) max(max(U*S*V'-F)) for s=1:k if S(s,s)~=0 S(s,s)=1/S(s,s); end end inv=V*S'*U'; inv*measure max(inv*measure-solution) 

Do you have any idea what should compare with size F? I took 250 (more than 1000), and the results are not satisfactory (the waiting time is acceptable, but not short). Also now I can compare the results with a well-known solution, but how can I choose k at all? I also attached a graph of 250 unit values ​​that I get, and their squares returned to normal. I do not know exactly how best to make screeplot in Matlab. Now I move on to a larger k to see if it will be a little less.

Thanks again, Jennifer

The image shows the 250 computed. I don't know exactly how to create a scree plot in Matlab.squared normalized single values

+4
source share
2 answers

You can learn more about the alternatives offered by scipy.sparse.linalg .

In any case, please note that the pseudo-version of the sparse matrix is ​​likely to be (very) dense, so this is actually not a very fruitful prospect (in the general case) when solving sparse linear systems.

You might like to describe your specific problem in more detail ( dot(A, x)= b+ e ). At least indicate:

  • 'typical' size A
  • "typical" percentage of non-zero entries in A
  • least squares mean that norm(e) minimized, but please indicate if your main interest is on x_hat or on b_hat , where e= b- b_hat and b_hat= dot(A, x_hat)

Update . If you have an idea of ​​the rank of A (and it is much less than the number of columns), you can try the least squares . Here is a simple implementation, where k is the number of first singular values ​​and vectors used (ie, "Effective" rank).

 from scipy.sparse import hstack from scipy.sparse.linalg import svds def tls(A, b, k= 6): """A tls solution of Ax= b, for sparse A.""" u, s, v= svds(hstack([A, b]), k) return v[-1, :-1]/ -v[-1, -1] 
+6
source

Regardless of the response to my comment, I think you could accomplish this quite easily using the Moore-Penrose SVD view . Find the SVD with scipy.sparse.linalg.svds, replace Sigma with its pseudo-inversion, and then multiply V * Sigma_pi * U 'to find the pseudo-inversion of the original matrix.

+3
source

All Articles