Maximizing the variances of the component vectors coincides with maximizing the "uniqueness" of these vectors. Thus, you are vectors as far as possible from each other. Thus, if you use only the first N component vectors, you collect more space with highly changing vectors than with similar vectors. Think about what the core component means.
Take, for example, a situation where you have two lines orthogonal in three-dimensional space. You can completely capture the environment with these orthogonal lines than two lines that are parallel (or almost parallel). When applied to very high dimensional states using very few vectors, this becomes a much more important relationship between supported vectors. In the sense of linear algebra, you want independent lines to be created using PCA, otherwise some of these lines will be redundant.
See PDF from the Princeton Department of CS for an explanation.
Pyrce
source share