Python: Why eigenvectors do not match the first PCA weights?

Question

Python: Why eigenvectors do not match the first PCA weights?

Let the array be generated:

import numpy as np data = np.arange(30).reshape(10,3) data=data*data array([[ 0, 1, 4], [ 9, 16, 25], [ 36, 49, 64], [ 81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361, 400], [441, 484, 529], [576, 625, 676], [729, 784, 841]])

Then we find the eigenvalues of the covariance matrix:

 mn = np.mean(data, axis=0) data -= mn C = np.cov(data.T) evals, evecs = la.eig(C) idx = np.argsort(evals)[::-1] evecs = evecs[:,idx] print evecs array([[-0.53926461, -0.73656433, 0.40824829], [-0.5765472 , -0.03044111, -0.81649658], [-0.61382979, 0.67568211, 0.40824829]])

Now run the matplotlib.mlab.PCA function from the data:

 import matplotlib.mlab as mlab mpca=mlab.PCA(data) print mpca.Wt [[ 0.57731894 0.57740574 0.57732612] [ 0.72184459 -0.03044628 -0.69138514] [ 0.38163232 -0.81588947 0.43437443]]

Why are two matrices different? I thought that in the search for ATP, you first had to find the eigenvectors of the covariance matrix and that this would be exactly equal to the weight.

+7

python numpy matplotlib eigenvector pca

user2974839 Nov 12 '13 at 17:45

source share

1 answer

Jaime · Accepted Answer · 2013-11-12T18:38:26+0000

You need to normalize your data, and not just center it, and the output of np.linalg.eig should be transferred according to mlab.PCA :

 >>> n_data = (data - data.mean(axis=0)) / data.std(axis=0) >>> evals, evecs = np.linalg.eig(np.cov(n_data.T)) >>> evecs = evecs[:, np.argsort(evals)[::-1]].T >>> mlab.PCA(data).Wt array([[ 0.57731905, 0.57740556, 0.5773262 ], [ 0.72182079, -0.03039546, -0.69141222], [ 0.38167716, -0.8158915 , 0.43433121]]) >>> evecs array([[-0.57731905, -0.57740556, -0.5773262 ], [-0.72182079, 0.03039546, 0.69141222], [ 0.38167716, -0.8158915 , 0.43433121]])

Python: Why eigenvectors do not match the first PCA weights?

More articles: