Scikit-learn kernel PCA explained variance

Question

Scikit-learn kernel PCA explained variance

I use the regular PCA from scikit-learn and get the dispersion coefficients for each main component without any problems.

pca = sklearn.decomposition.PCA(n_components=3) pca_transform = pca.fit_transform(feature_vec) var_values = pca.explained_variance_ratio_

I want to investigate differnt kernels using the PCA kernel, as well as the desired dispersion coefficients explained, but now I see that this attribute does not. Does anyone know how to get these values?

 kpca = sklearn.decomposition.KernelPCA(kernel=kernel, n_components=3) kpca_transform = pca.fit_transform(feature_vec) var_values = kpca.explained_variance_ratio_

AttributeError: KernelPCA object does not have attribute 'explain_variance_ratio _'

+7

python scikit-learn

ecn113 Apr 13 '15 at 17:54

source share

3 answers

EelkeSpaak · Answer 1 · 2015-06-04T14:28:40+0000

I know this question is old, but I ran into the same “problem” and found an easy solution when I realized that pca.explained_variance_ is just a variance of components. You can simply calculate the explained variance (and ratio) by doing:

 kpca_transform = kpca.fit_transform(feature_vec) explained_variance = numpy.var(kpca_transform, axis=0) explained_variance_ratio = explained_variance / numpy.sum(explained_variance)

and as a bonus, to get an explanation of the cumulative proportion (often useful when choosing components and evaluating the dimension of your space):

 numpy.cumsum(explained_variance_ratio)

Krishna Kalyan · Answer 2 · 2016-11-19T00:23:42+0000

The main reason K-PCA does not have explained_variance_ratio_ is because after the kernel transformation, your data / vectors live in a different object space. Therefore, K-PCA is not intended to be interpreted as a PCA.

Faz · Answer 3 · 2017-07-03T08:22:38+0000

I was intrigued by this, so I did some tests. below is my code.

the graphs show that the first kernelpca component is the best discriminator of a dataset. however, when explain_variance_ratios is calculated based on @EelkeSpaak's explanation, we see only a 50% deviation that does not make sense. therefore, he inclines me to agree with the explanation of @ Krishna Kalyan.

 #get data from sklearn.datasets import make_moons import numpy as np import matplotlib.pyplot as plt x, y = make_moons(n_samples=100, random_state=123) plt.scatter(x[y==0, 0], x[y==0, 1], color='red', marker='^', alpha=0.5) plt.scatter(x[y==1, 0], x[y==1, 1], color='blue', marker='o', alpha=0.5) plt.show() ##seeing effect of linear-pca------- from sklearn.decomposition import PCA pca = PCA(n_components=2) x_pca = pca.fit_transform(x) x_tx = x_pca fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3)) ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5) ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5) ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5) ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5) ax[0].set_xlabel('PC-1') ax[0].set_ylabel('PC-2') ax[0].set_ylim([-0.8,0.8]) ax[1].set_ylim([-0.8,0.8]) ax[1].set_yticks([]) ax[1].set_xlabel('PC-1') plt.show() ##seeing effect of kernelized-pca------ from sklearn.decomposition import KernelPCA kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15) x_kpca = kpca.fit_transform(x) x_tx = x_kpca fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3)) ax[0].scatter(x_tx[y==0, 0], x_tx[y==0, 1], color='red', marker='^', alpha=0.5) ax[0].scatter(x_tx[y==1, 0], x_tx[y==1, 1], color='blue', marker='o', alpha=0.5) ax[1].scatter(x_tx[y==0, 0], np.zeros((50,1))+0.02, color='red', marker='^', alpha=0.5) ax[1].scatter(x_tx[y==1, 0], np.zeros((50,1))-0.02, color='blue', marker='o', alpha=0.5) ax[0].set_xlabel('PC-1') ax[0].set_ylabel('PC-2') ax[0].set_ylim([-0.8,0.8]) ax[1].set_ylim([-0.8,0.8]) ax[1].set_yticks([]) ax[1].set_xlabel('PC-1') plt.show() ##comparing the 2 pcas------- #get the transformer tx_pca = pca.fit(x) tx_kpca = kpca.fit(x) #transform the original data x_pca = tx_pca.transform(x) x_kpca = tx_kpca.transform(x) #for the transformed data, get the explained variances expl_var_pca = np.var(x_pca, axis=0) expl_var_kpca = np.var(x_kpca, axis=0) print('explained variance pca: ', expl_var_pca) print('explained variance kpca: ', expl_var_kpca) expl_var_ratio_pca = expl_var_pca / np.sum(expl_var_pca) expl_var_ratio_kpca = expl_var_kpca / np.sum(expl_var_kpca) print('explained variance ratio pca: ', expl_var_ratio_pca) print('explained variance ratio kpca: ', expl_var_ratio_kpca)

Scikit-learn kernel PCA explained variance

More articles: