I tried to create a logistic regression model on sampled data.
The result of the model that we can get are the weights of the functions used to build the model.
I could not find the Spark API for the standard estimation error, statistics on the city of Wald Chi, p values, etc.
I insert my codes below as an example
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.evaluation.{BinaryClassificationMetrics, MulticlassMetrics} import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.tree.RandomForest import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} val sc = new SparkContext(new SparkConf().setAppName("SparkTest").setMaster("local[*]")) val sqlContext = new org.apache.spark.sql.SQLContext(sc); val data: RDD[String] = sc.textFile("C:/Users/user/Documents/spark-1.5.1-bin-hadoop2.4/data/mllib/credit_approval_2_attr.csv") val parsedData = data.map { line => val parts = line.split(',').map(_.toDouble) LabeledPoint(parts(0), Vectors.dense(parts.tail)) }
The output weight of the model is
[-0.03335987643613915,0.025215092730373874,0.22617842810253946,0.29415985532104943,-0.0025559467210279694,4.5242237280512646E-4]
just an array of weights.
Although I managed to calculate the accuracy, repetition, accuracy, sensitivity and other diagnostics of the model.
Is there a way I can calculate the standard estimation error, the statistics of Wald-Chi, the p value in Spark?
I am concerned that there is standard output in R or SAS.
Does this need to be done using the optimization method we use in Spark?
Here we use L-BFGS or SGD.
Maybe I am not aware of the assessment methodology.
Any suggestion would be highly appreciated.