SPARK, ML, Tuning, CrossValidator: access to metrics

Question

SPARK, ML, Tuning, CrossValidator: access to metrics

To build a multiclass NaiveBayes classifier, I use CrossValidator to select the best options in my pipeline:

val cv = new CrossValidator() .setEstimator(pipeline) .setEstimatorParamMaps(paramGrid) .setEvaluator(new MulticlassClassificationEvaluator) .setNumFolds(10) val cvModel = cv.fit(trainingSet)

The pipeline contains ordinary transformers and ratings in the following order: Tokenizer, StopWordsRemover, HashingTF, IDF, and finally NaiveBayes.

Is it possible to access metrics calculated for the best model?

Ideally, I would like to access the metrics of all models to see how changing the parameters changes the quality of the classification. But at the moment, the best model is good enough.

FYI, I am using Spark 1.6.0

+8

apache-spark apache-spark-mllib apache-spark-ml

Rami Jan 08 '16 at 13:59

source share

2 answers

  cvModel.avgMetrics

works in pyspark 2.2.0

0

Donald vetal Nov 09 '17 at 21:24

source share

Chris frregly · Accepted Answer · 2016-01-08T21:48:07+0000

Here is how I do it:

 val pipeline = new Pipeline() .setStages(Array(tokenizer, stopWordsFilter, tf, idf, word2Vec, featureVectorAssembler, categoryIndexerModel, classifier, categoryReverseIndexer)) ... val paramGrid = new ParamGridBuilder() .addGrid(tf.numFeatures, Array(10, 100)) .addGrid(idf.minDocFreq, Array(1, 10)) .addGrid(word2Vec.vectorSize, Array(200, 300)) .addGrid(classifier.maxDepth, Array(3, 5)) .build() paramGrid.size // 16 entries ... // Print the average metrics per ParamGrid entry val avgMetricsParamGrid = crossValidatorModel.avgMetrics // Combine with paramGrid to see how they affect the overall metrics val combined = paramGrid.zip(avgMetricsParamGrid) ... val bestModel = crossValidatorModel.bestModel.asInstanceOf[PipelineModel] // Explain params for each stage val bestHashingTFNumFeatures = bestModel.stages(2).asInstanceOf[HashingTF].explainParams val bestIDFMinDocFrequency = bestModel.stages(3).asInstanceOf[IDFModel].explainParams val bestWord2VecVectorSize = bestModel.stages(4).asInstanceOf[Word2VecModel].explainParams val bestDecisionTreeDepth = bestModel.stages(7).asInstanceOf[DecisionTreeClassificationModel].explainParams

SPARK, ML, Tuning, CrossValidator: access to metrics

More articles: