How to print forecasting probability in LogisticRegressionWithLBFGS for pyspark

I use Spark 1.5.1 and, In pyspark, after I fit the model using:

model = LogisticRegressionWithLBFGS.train(parsedData) 

I can print the prediction using:

 model.predict(p.features) 

Is there a function for printing probability estimates along with prediction?

0
machine-learning apache-spark pyspark apache-spark-mllib logistic-regression
source share
2 answers

You must clear the threshold first , and this only works for binary classification:

  from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel from pyspark.mllib.regression import LabeledPoint parsed_data = [LabeledPoint(0.0, [4.6,3.6,1.0,0.2]), LabeledPoint(0.0, [5.7,4.4,1.5,0.4]), LabeledPoint(1.0, [6.7,3.1,4.4,1.4]), LabeledPoint(0.0, [4.8,3.4,1.6,0.2]), LabeledPoint(1.0, [4.4,3.2,1.3,0.2])] model = LogisticRegressionWithLBFGS.train(sc.parallelize(parsed_data)) model.threshold # 0.5 model.predict(parsed_data[2].features) # 1 model.clearThreshold() model.predict(parsed_data[2].features) # 0.9873840020002339 
+7
source share

I assume that the question is to calculate the probability score for predicting the entire set of workouts. If so, I did the following to calculate it. Not sure if the message is still active, but it is, I did this:

 #get the original training data before it was converted to rows of LabelPoint. #let us assume it is otd ( of type spark DataFrame) #let us extract the featureset as rdd by: fs=otd.rdd.map(lambda x:x[1:]) # assuming label is col 0. #the below is just a sample way of creating a Labelpoint rows.. parsedData= otd.rdd.map(lambda x: reg.LabeledPoint(int(x[0]-1),x[1:])) # now convert otd to a panda DataFrame as: ptd= otd.toPandas() m= ptd.shape[0] # train and get the model model=LogisticRegressionWithLBFGS.train(trainingData,numClasses=10) #Now store the model.predict rdd structures predict=model.predict(fs) pr=predict.collect() correct=0 correct = ((ptd.label-1) == (pr)).sum() print((correct/m) *100) 

Note that this applies to the classification of several classes.

0
source share

All Articles