The purpose of displaying an ellipse around data points is to display a confidence interval, or, in other words, "how much data is within a certain standard deviation from the mean"
In the code above, he decided to display an ellipse that spans 95% of the data points. For a normal distribution, ~ 67% of the data is 1 s.d. from the average value, ~ 95% for 2 s. and ~ 99% for 3 s. (figures from the top of the head, but you can easily check this by calculating the area under the curve). Therefore, the value of STD=2; You will find that conf approximately 0.95 .
The distance of the data points from the center of gravity of the data looks like (xi^2+yi^2)^0.5 , ignoring the coefficients. The sums of squared random variables follow the chi-square distribution and therefore, to obtain the corresponding 95 percentile, it uses the inverse chi-square function, with dof 2, since there are two variables.
Finally, the justification for the multiplication of the moving constant follows from the fact that for a square matrix A with eigenvalues a1,...,an the eigenvalues of the matrix kA , where k is a scalar, are simply ka1,...,kan . The eigenvalues give the corresponding lengths of the main / auxiliary axis of the ellipse, and therefore scaling the ellipse or eigenvalues by 95% of the tile is equivalent to multiplying the covariance matrix by a scale factor.
EDIT
Cheng, although you may already know this, I suggest you also read this answer to the question of chance. Consider a Gaussian random variable with zero mean value, unit variance. A PDF collection of such random variables looks like this:

Now, if I took two such collections of random variables, placing them separately and adding them to form a single set of a new random variable, its distribution is as follows:

This is a chi-square distribution with 2 degrees of freedom (since we added two collections).
The ellipse equation in the above code can be written as x^2/a^2 +y^2/b^2=k , where x , y are two random variables, A and b are the main / minor axes, and k is some moving constant that we need to find out. As you can see, the above can be interpreted as squaring and adding two sets of Gaussian random variables, and we just saw above what its distribution looks like. So we can say that k is a random variable that is a chi-square distributed with 2 degrees of freedom.
Now all you need to do is find the value for k so that it contains 95% ile of data. Just like 1s.d, 2s.d, 3s.d. percentiles that we are familiar with Gaussians, a 95% tile for a chi-square with 2 degrees of freedom is about 6.18. This is what Amro gets from the chi2inv function. He could write scale=chi2inv(0.95,2) as well, and everything would be the same. It just speaks in terms of n sd from the average intuitively.
To illustrate the pdf file of the chi-square distribution above, with 95% of the area <some x shaded in red. This x is ~ 6.18.

Hope this helped.