Prepare the following:
import org.apache.spark.mllib.linalg.Vector import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel} val df = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val predictions = new NaiveBayes().fit(df).transform(df) val preds = predictions.select("probability", "label").rdd.map(row => (row.getAs[Vector](0)(0), row.getAs[Double](1)))
And rate:
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics new BinaryClassificationMetrics(preds, 10).roc
If the forecasts are only 0 or 1, the number of buckets may be lower, as in your case. Try more complex data, for example:
val anotherPreds = df1.select(rand(), $"label").rdd.map(row => (row.getDouble(0), row.getDouble(1))) new BinaryClassificationMetrics(anotherPreds, 10).roc
user6022341
source share