I work with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
In particular, I will need to calculate the pseudoinverse of my matrix to solve the least square problem. I found the sparse.lsqr method, but it is not very efficient. Is there a way to calculate the Moore-Penrose pseudoinverse (corresponding pinv for normal matrices).
The size of my matrix A is about 600'000x2000, and in each row of the matrix I will have from 0 to 4 non-zero values. The size of matrix A is given by a bundle of voxel x fibers (white fiber fibers), and we expect a maximum of 4 paths intersect in the voxel. In most cases of white voxels, we expect to have at least 1 treatise, but I will say that about 20% of the lines can be zeros.
Vector b should not be sparse, in fact b contains a measure for each voxel, which is not equal to zero at all.
I need to minimize the error, but there are also some conditions for the vector x. When I tried the model on smaller matrices, I never had to limit the system to satisfy these conditions (generally speaking, 0
Is this some kind of help? Is there a way to avoid using pseudo-inverse A?
thanks
June 1 Update: Thanks again for your help. I can not show you anything about my data, because the code in python gives me some problems. However, in order to understand how I can choose a good k, I tried to create a testing function in Matlab.
The code is as follows:
F=zeros(100000,1000); for k=1:150000 p=rand(1); a=0; b=0; while a<=0 || b<=0 a=random('Binomial',100000,p); b=random('Binomial',1000,p); end F(a,b)=rand(1); end solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100); size(solution) solution=solution'; measure=F*solution; %check=pinvF*measure; k=250; F=sparse(F); [U,S,V]=svds(F,k); s=svds(F,k); plot(s) max(max(U*S*V'-F)) for s=1:k if S(s,s)~=0 S(s,s)=1/S(s,s); end end inv=V*S'*U'; inv*measure max(inv*measure-solution)
Do you have any idea what should compare with size F? I took 250 (more than 1000), and the results are not satisfactory (the waiting time is acceptable, but not short). Also now I can compare the results with a well-known solution, but how can I choose k at all? I also attached a graph of 250 unit values ββthat I get, and their squares returned to normal. I do not know exactly how best to make screeplot in Matlab. Now I move on to a larger k to see if it will be a little less.
Thanks again, Jennifer

