How to give predictable columns and labels in a BinaryClassificationMetrics score for a Naive Bayes model

I have confusion regarding the inputs of BinaryClassificationMetrics (Mllib). In accordance with Apache Spark 1.6.0, we need to transfer the predicted and label type (RDD[(Double,Double)]) from the converted DataFrame, which, having predicted, is the probability (vector) and rawPrediction (vector).

I created RDD [(Double, Double)] from the Predicted and label columns. After doing a BinaryClassificationMetrics assessment in NavieBayesModel , I can get ROC, PR, etc. But the values โ€‹โ€‹are limited, I can not plot using the value obtained from this. Roc contains 4 values, and PR contains 3 values.

Is this the correct way to prepare PredictedandLabel or do I need to use a rawPrediction column or Probability instead of a Predicted column?

+5
source share
1 answer

Prepare the following:

 import org.apache.spark.mllib.linalg.Vector import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel} val df = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val predictions = new NaiveBayes().fit(df).transform(df) val preds = predictions.select("probability", "label").rdd.map(row => (row.getAs[Vector](0)(0), row.getAs[Double](1))) 

And rate:

 import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics new BinaryClassificationMetrics(preds, 10).roc 

If the forecasts are only 0 or 1, the number of buckets may be lower, as in your case. Try more complex data, for example:

 val anotherPreds = df1.select(rand(), $"label").rdd.map(row => (row.getDouble(0), row.getDouble(1))) new BinaryClassificationMetrics(anotherPreds, 10).roc 
+1
source

All Articles