What does the choice of large eigenvalues and eigenvectors in the covariance matrix mean when analyzing the data?

Question

What does the choice of large eigenvalues and eigenvectors in the covariance matrix mean when analyzing the data?

Suppose that there exists a matrix B , where its size is 500*1000 double (Here 500 is the number of observations, and 1000 is the number of functions).

sigma is the covariance matrix B , and D is the diagonal matrix whose diagonal elements are eigenvalues of sigma . Suppose that A are the eigenvectors of the covariance matrix sigma .

I have the following questions:

I need to select the first k = 800 eigenvectors corresponding to the eigenvalues with the largest value for ranking the selected features. The final matrix is named Aq . How can I do this in MATLAB?
What is the meaning of these selected eigenvectors?
It seems that the size of the final matrix Aq is 1000*800 double after calculating Aq . Time points / observation information 500 disappeared. For a finite matrix Aq , what does the value 1000 in the matrix Aq now mean? Besides, what does the value 800 in the matrix Aq now mean?

+7

matlab machine-learning pca data-analysis

Shawn Aug 7 '15 at 19:08

source share

1 answer

rayryeng · Accepted Answer · 2015-08-07T19:57:52+0000

I assume that you have defined eigenvectors from the eig function. In the future, I recommend that you use the eigs feature. This not only calculates the eigenvalues and eigenvectors for you, but also calculates the largest eigenvalues k with their associated eigenvectors. This can save computational overhead when you do not need to calculate all the eigenvalues and the associated eigenvectors of your matrix, since you only want a subset. You simply supply the covariance matrix of your data to eigs and it returns the k largest eigenvalues and eigenvectors for you.

Now, back to your problem, you will end up analyzing the main components . The mechanics of this would be to calculate the covariance matrix of your data and search for the eigenvalues and eigenvectors of the calculated result. It is known that doing this in this way is not recommended due to numerical instability with the calculation of eigenvalues and eigenvectors for large matrices. The most canonical way to do this now is to decompose a singular value . Specifically, the columns of the matrix V give you the eigenvectors of the covariance matrix or the main components, and the corresponding eigenvalues are the square root of the singular values created in the diagonals of the matrix S

See this informative post on Cross Checked why this is preferred:

https://stats.stackexchange.com/questions/79043/why-pca-of-data-by-means-of-svd-of-the-data

I will also add another link that talks about the theory that uses the singular decomposition of values in the main component analysis:

https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

Now answer your question in turn.

Question number 1

MATLAB generates eigenvalues and the corresponding ordering of eigenvectors in such a way that they are unsorted . If you want to select the largest k eigenvalues and the associated eigenvectors, given the result of eig (800 in your example), you need to sort the eigenvalues in descending order, and then reorder the columns of the eigenvector matrix obtained from eig , then select the first k values.

I should also note that using eigs does not guarantee an ordered order, so you will have to explicitly sort them too when it comes to it.

In MATLAB, the steps above will look something like this:

 sigma = cov(B); [A,D] = eig(sigma); vals = diag(D); [~,ind] = sort(abs(vals), 'descend'); Asort = A(:,ind);

It’s good to note that you are sorting by the absolute value of the eigenvalues , because scaled eigenvalues are also eigenvalues. These scales also include negatives. This means that if we had a component whose own value was, say, -10000, this is a very good sign that this component has some significant values for your data, and if we are sorted exclusively by the numbers themselves, this will be placed to the bottom rows

The first line of code finds the covariance matrix B , even if you said that it was already stored in sigma , but let it be reproducible. Next, we find the eigenvalues of your covariance matrix and the associated eigenvectors. Note that each column of the matrix of eigenvectors A represents one eigenvector. In particular, column ^{th th} / eigenvector A corresponds to ^{th th} eigenvalue specified in D

However, the eigenvalues are in the diagonal matrix, so we extract the diagonals using the diag command, sort them and draw and then rearrange A to observe this ordering. I use the second output of sort , because it tells you where each value will be displayed in the unsorted result in the sorted result. This is the ordering that we need to rearrange the columns of the matrix of eigenvectors A It is imperative that you select 'descend' as a flag so that the largest eigenvalue and associated eigenvector appear first, as we said earlier.

Then you can snatch the first k largest vectors and values with:

 k = 800; Aq = Asort(:,1:k);

Question number 2

It is well known that the eigenvectors of the covariance matrix are equal to the main components. Specifically, the first major component (i.e., the largest eigenvector and the largest eigenvalue associated with it) gives you the direction of maximum variability in your data. After that, each major component gives you variability of a declining nature. It is also good to note that each major component is orthogonal to each other.

Here is a good Wikipedia example for 2D data:

I pulled the above image from the Wikipedia article “Essential Component Analysis” with which I linked you above. This is a scattering plot of samples distributed over a two-dimensional Gaussian distribution centered at (1,3) with a standard deviation of 3 approximately in the direction of (0.878, 0.478) and 1 in the orthogonal direction. A component with a standard deviation of 3 is the first main component, and one that is orthogonal is the second component. The presented vectors are the eigenvectors of the covariance matrix, scaled by the square root of the corresponding eigenvalue, and shifted so that their tails are on average.

Now back to your question. The reason we look at the k largest eigenvalues is the way the reduction is performed . In essence, you will perform data compression, where you will take your data of a higher size and project it onto a lower spatial space. The more core components you include in your projection, the more it will look like the source data. In fact, it starts to fade out at a certain point, but the first few key components allow you to accurately restore your data for the most part.

A great visual example of PCA (or SVD) execution and data recovery is in this large Quora entry that I came across earlier.

http://qr.ae/RAEU8a

Question number 3

You would use this matrix to reprogram your higher dimensional data into lower spatial space. A row count of 1000 still exists, which means that your dataset originally had 1000 functions. 800 is what reduced the dimension of your data. We consider this matrix as a transformation from the initial dimension of the attribute (1000) up to its reduced dimension (800).

Then you would use this matrix in conjunction with restoring the original data. Specifically, this will give you a rough idea of what the raw data looked like with the fewest errors. In this case, you do not need to use all the main components (i.e., only the largest vectors k ), and you can create an approximation of your data with less information than what was before.

How you reconstruct your data is very simple. Let's talk about forward and reverse operations with full data first. The front operation is to take the initial data and reprogram them, but instead of a lower dimension, we will use everything for the components. First you need to have the source data, but means subtraction:

 Bm = bsxfun(@minus, B, mean(B,1));

Bm will create a matrix where each feature of each sample will be subtracted on average. bsxfun allows bsxfun to subtract two matrices in an unequal dimension, provided that you can broadcast sizes so that they can both match. In particular, what happens in this case, the average value of each column / function B will be calculated and a temporary replicated matrix up to B will be created. When you subtract your source data using this replicated matrix, the effect will subtract each data point with their appropriate means, thus decentralizing your data so that the average value of each function is 0.

Once you do this, the operation for the project is simple:

 Bproject = Bm*Asort;

The above operation is quite simple. What you do expresses each function pattern as a linear combination of core components. For example, in the first example or the first row of decentralized data, the first sampling function in the projected domain is the dot product of the row vector, which refers to the entire sample and the first main component, which is the column vector. the second example of the second sample in the predicted domain is the weighted sum of the entire sample and the second component. You repeat this for all samples and all major components. In essence, you are reprogramming the data so that it relates to the main components, which are orthogonal basis vectors that transform your data from one representation to another.

The best description of what I just talked about can be found here. Look at Amro's answer:

Basic Matlab component analysis (eigenvalue order)

Now, to go back, you simply perform the inverse operation, but the special property with the matrix of eigenvectors is that if you transpose it, you will get the opposite. To return the original data, you cancel the operation above and add the tool back to the problem:

 out = bsxfun(@plus, Bproject*Asort.', mean(B, 1));

You want to return the original data, so you decide for Bm regarding the previous operation that I did. However, the reverse to Asort is simply transposition here. What happens after this operation is done is that you are returning the original data, but the data is still decentralized. To return the original data, you must add the means of each function back to the data matrix to get the final result. This is why we are using another bsxfun call here so that you can do this for each sample parameter value.

You should be able to go back and forth from the source domain and the projected domain using the two lines of code above. Now, when the reduction in dimension (or approximation of the source data) comes into play, this is the opposite operation. First you need to project the data onto the fundamentals of the main components (i.e., forward operation), but now, to return to the original domain, where we are trying to recover data with a reduced number of main components, you just replace Asort in the above code with Aq , and also reduce the number of functions that you use in Bproject . In particular:

 out = bsxfun(@plus, Bproject(:,1:k)*Aq.', mean(B, 1));

Executing Bproject(:,1:k) selects the functions k in the projected domain of your data that correspond to the k largest eigenvectors. Interestingly, if you just want to present data in relation to a reduced dimension, you can simply use Bproject(:,1:k) , and that will be enough. However, if you want to go ahead and calculate a rough estimate of the source data, we need to calculate the inverse step. The above code is what it was before, with the full dimension of your data, but we use Aq and also highlight the k functions in Bproject . This will give you the raw data, which is represented by the k largest eigenvectors / eigenvalues in your matrix.

If you want to see an amazing example, I will imitate the Quora post that I linked to you, but using a different image. Think about this with a grayscale image, where each row is a “pattern” and each column is a tag. Take the operator image, which is part of the image processing toolbar:

 im = imread('camerman.tif'); imshow(im); %// Using the image processing toolbox

We get this image:

These are 256 x 256 images, which means that we have 256 data points, and each point has 256 functions. What I'm going to do is convert the image to double for the exact calculation of the covariance matrix. Now what I'm going to do is repeat the above code, but gradually increasing k in each direction from 3, 11, 15, 25, 45, 65 and 125. Therefore, for each k we introduce more basic components, and we should gradually start The process of recovering our data.

Here is some executable code that illustrates my point:

 %%%%%%%// Pre-processing stage clear all; close all; %// Read in image - make sure we cast to double B = double(imread('cameraman.tif')); %// Calculate covariance matrix sigma = cov(B); %// Find eigenvalues and eigenvectors of the covariance matrix [A,D] = eig(sigma); vals = diag(D); %// Sort their eigenvalues [~,ind] = sort(abs(vals), 'descend'); %// Rearrange eigenvectors Asort = A(:,ind); %// Find mean subtracted data Bm = bsxfun(@minus, B, mean(B,1)); %// Reproject data onto principal components Bproject = Bm*Asort; %%%%%%%// Begin reconstruction logic figure; counter = 1; for k = [3 11 15 25 45 65 125 155] %// Extract out highest k eigenvectors Aq = Asort(:,1:k); %// Project back onto original domain out = bsxfun(@plus, Bproject(:,1:k)*Aq.', mean(B, 1)); %// Place projection onto right slot and show the image subplot(4, 2, counter); counter = counter + 1; imshow(out,[]); title(['k = ' num2str(k)]); end

As you can see, most of the code matches what we saw. What differs from the fact that I loop in all the values of k , go back to the original space (i.e., Calculating the approximation) with k highest eigenvectors, then show the image.

We get this pretty figure:

As you can see, starting with k=3 does not really render us any assistance ... we can see some general structure, but it will not hurt to add more. When we begin to increase the number of components, we begin to get a clearer idea of how the source data looks. At k=25 we can really see that the operator looks perfect, and we don’t need components 26 onwards to see what happens. This is what I talked about in terms of data compression, when you do not need to work with all the main components to get a clear idea of what is happening.

I would like to end this note by referring to Chris Taylor’s excellent presentation on “Core Component Analysis,” with code, graphs, and a great explanation for downloading! This is where I started working at PCA, but the Quora post is what strengthened my knowledge.

Matlab - PCA analysis and reconstruction of multidimensional data

What does the choice of large eigenvalues ​​and eigenvectors in the covariance matrix mean when analyzing the data?

Question number 1

Question number 2

Question number 3

More articles:

What does the choice of large eigenvalues and eigenvectors in the covariance matrix mean when analyzing the data?