How to use the value of the spark parameter in a random forest?

Question

How to use the value of the spark parameter in a random forest?

The documentation for Random Forests does not include the importance function. However, it is listed on Jira as authorized and is located in the source code . HERE also says: "The main differences between this API and the original MLlib API:

Support for DataFrames and ML Pipelines
separation of classification against regression
using DataFrame metadata to distinguish between continuous and categorical functions
more functionality for random forests: estimates of function importance , as well as the predicted probability of each class (conditional probabilities of class aka) for classification.

However, I cannot understand the syntax that works to call this new function.

scala> model res13: org.apache.spark.mllib.tree.model.RandomForestModel = TreeEnsembleModel classifier with 10 trees scala> model.featureImportances <console>:60: error: value featureImportances is not a member of org.apache.spark.mllib.tree.model.RandomForestModel model.featureImportances

+7

scala random-forest apache-spark apache-spark-mllib

Climbs_lika_Spyder Jan 05 '16 at 22:15

source share

1 answer

Climbs_lika_Spyder · Answer 1 · 2016-01-05T22:33:34+0000

You must use new random forests. Check import. OLD:

 import org.apache.spark.mllib.tree.RandomForest import org.apache.spark.mllib.tree.model.RandomForestModel

New random forests use:

 import org.apache.spark.ml.classification.RandomForestClassificationModel import org.apache.spark.ml.classification.RandomForestClassifier

How to use the value of the spark parameter in a random forest?

More articles: