EDIT Now I have completely redesigned the answer to understand which assumptions were wrong.
Before explaining what does not work in OP, let me make sure that we have the same terminology. In the main component analysis, the goal is to obtain a coordinate transformation that separates the observations well, and this can facilitate the description of the data, i.e. Various multidimensional observations in a space with a lower size. Observations are multidimensional when they are composed of several dimensions. If there are fewer linearly independent observations than there are measurements, we expect that at least one of the eigenvalues will be zero, since, for example, two linearly independent observation vectors in three-dimensional space can be described by a two-dimensional plane.
If we have an array
x = [ 1 3 4 2 4 -1 4 6 9 3 5 -2];
consisting of four observations with three dimensions each, princomp(x) will find the inferior space spanned by four observations. Since there are two co-dependent dimensions, one of the eigenvalues will be close to zero, since the measurement space is only 2D, not 3D, which is probably the result you wanted to find. Indeed, if you check the eigenvectors ( coeff ), you will find that the first two components are extremely clearly collinear
coeff = princomp(x) coeff = 0.10124 0.69982 0.70711 0.10124 0.69982 -0.70711 0.9897 -0.14317 1.1102e-16
Since the first two components, in fact, point in opposite directions, the values of the first two components of the transformed observations are meaningless in themselves: [1 1 25] equivalent to [1000 1000 25] .
Now, if we want to find out if any measurements are linearly dependent, and if we really want to use the main components for this, because in real life my measurements are not completely collinear, and we are interested in finding good descriptor vectors for machine application training, it is more reasonable to consider the three dimensions as' observations' and run princomp(x') . Since there are thus only three “observations” but four “dimensions”, the fourth eigenvector will be zero. However, since there are two linearly dependent observations, we leave only two nonzero eigenvalues:
eigenvalues = 24.263 3.7368 0 0
To find out which of the measurements is so strongly correlated (not really needed if you use measurements converted to your own vector as input, for example, for machine learning), the best way would be to look at the correlation between the measurements:
corr(x) ans = 1 1 0.35675 1 1 0.35675 0.35675 0.35675 1
Not surprisingly, each dimension correlates perfectly with itself, and v1 correlates well with v2 .
EDIT2
but the eigenvalues tell us which vectors in the new space are most important (cover most of the variation), and the coefficients tell us how much each variable is in each component. therefore, I suggest that we can use this data to find out which of the source variables has the greatest variance and therefore is most important (and get rid of those that represent a small amount).
This works if your observations show very small variance in one measurement variable (for example, where x = [1 2 3;1 4 22;1 25 -25;1 11 100]; and, therefore, the first variable contributes nothing to variance ) However, in collinear measurements, both vectors contain equivalent information and make the same contribution to the variance. Thus, the eigenvectors (coefficients) are likely to be similar to each other.
To keep @agnieszka's comments meaningful, I left starting points 1-4 of my answer below. Note that # 3 was in response to dividing the eigenvectors by eigenvalues, which for me did not make much sense.
- vectors should be in rows, not in columns (each vector is an observation).
coeff returns the base vectors of the main component and its order has little to do with the original input- To see the importance of the main components, you use
eigenvalues/sum(eigenvalues) - If you have two collinear vectors, you cannot say that the first is important, but the second is not. How do you know that this should not be the other way around? If you want to check for colinearity, you must check the rank of the array or call
unique for normalized (i.e., norm is 1) vectors.