How to print forecasting probability in LogisticRegressionWithLBFGS for pyspark

Question

How to print forecasting probability in LogisticRegressionWithLBFGS for pyspark

I use Spark 1.5.1 and, In pyspark, after I fit the model using:

model = LogisticRegressionWithLBFGS.train(parsedData)

I can print the prediction using:

 model.predict(p.features)

Is there a function for printing probability estimates along with prediction?

0

machine-learning apache-spark pyspark apache-spark-mllib logistic-regression

Hemanth mahadevaiah Nov 06 '15 at 6:33

source share

2 answers

I assume that the question is to calculate the probability score for predicting the entire set of workouts. If so, I did the following to calculate it. Not sure if the message is still active, but it is, I did this:

 #get the original training data before it was converted to rows of LabelPoint. #let us assume it is otd ( of type spark DataFrame) #let us extract the featureset as rdd by: fs=otd.rdd.map(lambda x:x[1:]) # assuming label is col 0. #the below is just a sample way of creating a Labelpoint rows.. parsedData= otd.rdd.map(lambda x: reg.LabeledPoint(int(x[0]-1),x[1:])) # now convert otd to a panda DataFrame as: ptd= otd.toPandas() m= ptd.shape[0] # train and get the model model=LogisticRegressionWithLBFGS.train(trainingData,numClasses=10) #Now store the model.predict rdd structures predict=model.predict(fs) pr=predict.collect() correct=0 correct = ((ptd.label-1) == (pr)).sum() print((correct/m) *100)

Note that this applies to the classification of several classes.

0

sunny Jul 17 '17 at 7:12

source share

desertnaut · Accepted Answer · 2015-11-06T08:41:08+0000

You must clear the threshold first , and this only works for binary classification:

  from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel from pyspark.mllib.regression import LabeledPoint parsed_data = [LabeledPoint(0.0, [4.6,3.6,1.0,0.2]), LabeledPoint(0.0, [5.7,4.4,1.5,0.4]), LabeledPoint(1.0, [6.7,3.1,4.4,1.4]), LabeledPoint(0.0, [4.8,3.4,1.6,0.2]), LabeledPoint(1.0, [4.4,3.2,1.3,0.2])] model = LogisticRegressionWithLBFGS.train(sc.parallelize(parsed_data)) model.threshold # 0.5 model.predict(parsed_data[2].features) # 1 model.clearThreshold() model.predict(parsed_data[2].features) # 0.9873840020002339

How to print forecasting probability in LogisticRegressionWithLBFGS for pyspark

More articles: