The TruncatedSVD dispersion coefficient explained is not in descending order, unlike the sclear PCA. I looked at the source code, and it seems that they use a different way of calculating the explained variance relation:
TruncatedSVD :
U, Sigma, VT = randomized_svd(X, self.n_components, n_iter=self.n_iter, random_state=random_state) X_transformed = np.dot(U, np.diag(Sigma)) self.explained_variance_ = exp_var = np.var(X_transformed, axis=0) if sp.issparse(X): _, full_var = mean_variance_axis(X, axis=0) full_var = full_var.sum() else: full_var = np.var(X, axis=0).sum() self.explained_variance_ratio_ = exp_var / full_var
ATP :
U, S, V = linalg.svd(X, full_matrices=False) explained_variance_ = (S ** 2) / n_samples explained_variance_ratio_ = (explained_variance_ / explained_variance_.sum())
PCA uses sigma to directly compute the explained variable, and since the sigma is in descending order, the explained variable is also in descending order. On the other hand, TruncatedSVD uses the variance of the columns of the transformed matrix to calculate the explained_variance, and therefore, the variances are not necessarily in descending order.
Does this mean that I need to sort explained_variance_ratio from TruncatedSVD first to find the main components of the k principle?
python scikit-learn pca svd
Xiangyu
source share