Search for proportional columns in a matrix

I have a large matrix (1000 rows and 50 000 columns). I know that some columns are correlated (the rank is only 100), and I suspect that some columns are even proportional. How to find such proportional columns? (one way will be cyclical corr(M(:,j),M(:,k))), but is there anything more efficient?

+4
source share
3 answers

If you normalize each column by dividing it by a maximum, proportionality becomes equality. This makes the task easier.

Now, to verify equality, you can use one (outer) loop over the columns; inner loop is easily vectorized with bsxfun. For greater speed, compare each column only with the columns on the right.

Also, to save some time, the results matrix is ​​pre-distributed to an approximate size (you should set this). If the approximate size is incorrect, the only punishment will be a little slower, but the code works.

As usual, tests for equality between floating point values ​​should include tolerance.

The result is defined as a matrix with two columns ( S), where each row contains the indices of two rows proportional.

A = [1 5 2 6 3 1
     2 5 4 7 6 1
     3 5 6 8 9 1]; %// example data matrix
tol = 1e-6; %// relative tolerance

A = bsxfun(@rdivide, A, max(A,[],1)); %// normalize A
C = size(A,2);
S = NaN(round(C^1.5),2); %// preallocate result to *approximate* size
used = 0; %// number of rows of S already used
for c = 1:C
    ind = c+find(all(abs(bsxfun(@rdivide, A(:,c), A(:,c+1:end))-1)<tol));
    u = numel(ind); %// number of columns proportional to column c
    S(used+1:used+u,1) = c; %// fill in result
    S(used+1:used+u,2) = ind; %// fill in result
    used = used + u; %// update number of results
end
S = S(1:used,:); %// remove unused rows of S

In this example, the result

S =
     1     3
     1     5
     2     6
     3     5

The value of column 1 is proportional to column 3; column 1 is proportional to column 5, etc.

+4

, , , , . , QR Decomposition. QR- : Q R. :

A = Q*R

Q - , , Q (Q^{T}*Q = I). R - . 1996 : Matrix Computations , R . - R, . , . , , , .

, , , , . , QR :

[Q,R] = qr(A, 0);

Q R - , , A . 0 - Q R, , ( ), , . , 5 x 8, 5 x 8, , 0, 8 x 8.

:

[Q,R,E] = qr(A, 0);

E , , :

A(:,E) = Q*R;

, , , Q R , , , "". , E , , " " . "" R, . , . , , R , , , , .

R, . , . , , , .

, , , , , , A:

%// Step #1 - Define tolerance
tol = 1e-10;

%// Step #2 - Do QR Factorization
[Q, R, E] = qr(A,0);
diag_R = abs(diag(R)); %// Extract diagonals of R

%// Step #3 -
%// Find the LAST column in the re-arranged result that
%// satisfies the linearly independent property
r = find(diag_R >= tol*diag_R(1), 1, 'last');

%// Step #4
%// Anything after r means that the columns are
%// linearly dependent, so let output those columns to the
%// user
idx = sort(E(r+1:end));

, E , , , , , . . , :

A =

     1     1     2     0
     2     2     4     9
     3     3     6     7
     4     4     8     3

, , . , . , , :

idx =

1    2

E, , :

E = 

4    3    2    1

, 4 "" , 3. [1,2] , , 1 2 [1,2,3,4] . 3, 1 2 3.

, !


QR, Echelon form, , A. , , , , . , , rref. rref, , , , . :

[B,RB] = rref(A);

RB , B A. , , , . , 1 , , RB, , , . :

[B,RB] = rref(A);
idx = 1 : size(A,2);
idx(RB) = [];

, :

idx = 

2    3

, 2 3 , , 1. QR-, QR- , , . 1 3 , , . .

rref QR-. , rref , QR-, . , QR-, rref , !

+6

, .

50 000 2 25 000 .

2 2.

: , .

Apply the determinant formula to the square starting with the 1st square on the left. Copy it for each row and column to get the answer in the following table.

Find columns with determinants of zero.

It is quite simple, not very time consuming and should be result oriented. Manual or Excel SpreadSheet (efficient)

0
source

All Articles