PCA implementation in Java

I need a PCA implementation in Java. I am interested in finding something that is well documented, practical, and easy to use. Any recommendations?

+8
java pca
source share
5 answers

Currently, there are a number of implementations of the core components for Java.

  • Apache Spark: https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html#principal-component-analysis-pca

    SparkConf conf = new SparkConf().setAppName("PCAExample").setMaster("local"); try (JavaSparkContext sc = new JavaSparkContext(conf)) { //Create points as Spark Vectors List<Vector> vectors = Arrays.asList( Vectors.dense( -1.0, -1.0 ), Vectors.dense( -1.0, 1.0 ), Vectors.dense( 1.0, 1.0 )); //Create Spark MLLib RDD JavaRDD<Vector> distData = sc.parallelize(vectors); RDD<Vector> vectorRDD = distData.rdd(); //Execute PCA Projection to 2 dimensions PCA pca = new PCA(2); PCAModel pcaModel = pca.fit(vectorRDD); Matrix matrix = pcaModel.pc(); } 
  • ND4J: http://nd4j.org/doc/org/nd4j/linalg/dimensionalityreduction/PCA.html

     //Create points as NDArray instances List<INDArray> ndArrays = Arrays.asList( new NDArray(new float [] {-1.0F, -1.0F}), new NDArray(new float [] {-1.0F, 1.0F}), new NDArray(new float [] {1.0F, 1.0F})); //Create matrix of points (rows are observations; columns are features) INDArray matrix = new NDArray(ndArrays, new int [] {3,2}); //Execute PCA - again to 2 dimensions INDArray factors = PCA.pca_factor(matrix, 2, false); 
  • Apache Commons Math (single-threaded, no frame)

     //create points in a double array double[][] pointsArray = new double[][] { new double[] { -1.0, -1.0 }, new double[] { -1.0, 1.0 }, new double[] { 1.0, 1.0 } }; //create real matrix RealMatrix realMatrix = MatrixUtils.createRealMatrix(pointsArray); //create covariance matrix of points, then find eigen vectors //see https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues Covariance covariance = new Covariance(realMatrix); RealMatrix covarianceMatrix = covariance.getCovarianceMatrix(); EigenDecomposition ed = new EigenDecomposition(covarianceMatrix); 

Note. The decomposition of a singular value, which can also be used to search for the main components, has equivalent implementations.

+13
source share

Here is one of them: PCA class .

This class contains the methods needed for a basic analysis of the main components with varimax rotation. Variants are available for analysis using either covariance or correlation martics. Parallel analysis is carried out using Monte Carlo simulations. Extraction criteria are available based on eigenvalues โ€‹โ€‹greater than one, greater than the percentage coefficient of Monte Carlo eigenvalues, or greater than the Monte Carlo eigenvalue means.

+7
source share

check out http://weka.sourceforge.net/doc.stable/weka/attributeSelection/PrincipalComponents.html weka actually has many other algorithms that can be used with the PCA, and weka also adds more algorithms from time to time. so if you are working on java switch to weka api.

+2
source share

Smile is a complete ML library for java. Try your PCA procedure. See: https://haifengl.imtqy.com/smile/api/java/smile/projection/PCA.html

There is also a PCA tutorial using Smile, but the tutorial uses Scala.

+2
source share

You can see several PCA implementations in the DataMelt project:

https://jwork.org/dmelt/code/index.php?keyword=PCA

(they are rewritten in Jython). They include several graphic examples of dimensionality reduction. They show the use of several Java packages, such as JSAT, DatumBox, and others.

+1
source share

All Articles