Scaling covariance matrices

Question

Scaling covariance matrices

For the question "Ellipse around data in MATLAB" in the answer given by Amro , he says the following:

“If you want the ellipse to represent a certain level of deviation standards, the right way is to scale the covariance matrix”

and the code for scaling was set as

STD = 2; %# 2 standard deviations conf = 2*normcdf(STD)-1; %# covers around 95% of population scale = chi2inv(conf,2); %# inverse chi-squared with dof=#dimensions Cov = cov(X0) * scale; [VD] = eig(Cov);

I do not understand the first three lines of the above code snippet. What is the scale calculated for chi2inv(conf,2) , and what is the point of multiplying it by the covariance matrix?

Additional question:

I also found that if I scale it with 1.5 STD, i.e. 86% of the tiles, the ellipse can cover all the points, my set of glasses shrinks together in almost all cases. On the other hand, if I scale it with 3 STD, that is 99% of the tiles, the ellipse is too large. Then how can I choose STD to just close the merge points tightly?

Here is an example:

The internal ellipse corresponds to 1.5 STD and external to 2.5 STD. Why is 1.5 STD tightly covering moving white dots? Is there any approach or reason for defining it?

enter image description here

+8

math matlab ellipse

Cheung Apr 6 '11 at 18:10

source share

1 answer

abcd · Accepted Answer · 2011-04-06T19:15:29+0000

The purpose of displaying an ellipse around data points is to display a confidence interval, or, in other words, "how much data is within a certain standard deviation from the mean"

In the code above, he decided to display an ellipse that spans 95% of the data points. For a normal distribution, ~ 67% of the data is 1 s.d. from the average value, ~ 95% for 2 s. and ~ 99% for 3 s. (figures from the top of the head, but you can easily check this by calculating the area under the curve). Therefore, the value of STD=2; You will find that conf approximately 0.95 .

The distance of the data points from the center of gravity of the data looks like (xi^2+yi^2)^0.5 , ignoring the coefficients. The sums of squared random variables follow the chi-square distribution and therefore, to obtain the corresponding 95 percentile, it uses the inverse chi-square function, with dof 2, since there are two variables.

Finally, the justification for the multiplication of the moving constant follows from the fact that for a square matrix A with eigenvalues a1,...,an the eigenvalues of the matrix kA , where k is a scalar, are simply ka1,...,kan . The eigenvalues give the corresponding lengths of the main / auxiliary axis of the ellipse, and therefore scaling the ellipse or eigenvalues by 95% of the tile is equivalent to multiplying the covariance matrix by a scale factor.

EDIT

Cheng, although you may already know this, I suggest you also read this answer to the question of chance. Consider a Gaussian random variable with zero mean value, unit variance. A PDF collection of such random variables looks like this:

Now, if I took two such collections of random variables, placing them separately and adding them to form a single set of a new random variable, its distribution is as follows:

This is a chi-square distribution with 2 degrees of freedom (since we added two collections).

The ellipse equation in the above code can be written as x^2/a^2 +y^2/b^2=k , where x , y are two random variables, A and b are the main / minor axes, and k is some moving constant that we need to find out. As you can see, the above can be interpreted as squaring and adding two sets of Gaussian random variables, and we just saw above what its distribution looks like. So we can say that k is a random variable that is a chi-square distributed with 2 degrees of freedom.

Now all you need to do is find the value for k so that it contains 95% ile of data. Just like 1s.d, 2s.d, 3s.d. percentiles that we are familiar with Gaussians, a 95% tile for a chi-square with 2 degrees of freedom is about 6.18. This is what Amro gets from the chi2inv function. He could write scale=chi2inv(0.95,2) as well, and everything would be the same. It just speaks in terms of n sd from the average intuitively.

To illustrate the pdf file of the chi-square distribution above, with 95% of the area <some x shaded in red. This x is ~ 6.18.

Hope this helped.

Scaling covariance matrices

More articles: