How to use the value of the spark parameter in a random forest?

The documentation for Random Forests does not include the importance function. However, it is listed on Jira as authorized and is located in the source code . HERE also says: "The main differences between this API and the original MLlib API:

  • Support for DataFrames and ML Pipelines
  • separation of classification against regression
  • using DataFrame metadata to distinguish between continuous and categorical functions
  • more functionality for random forests: estimates of function importance , as well as the predicted probability of each class (conditional probabilities of class aka) for classification.

However, I cannot understand the syntax that works to call this new function.

scala> model res13: org.apache.spark.mllib.tree.model.RandomForestModel = TreeEnsembleModel classifier with 10 trees scala> model.featureImportances <console>:60: error: value featureImportances is not a member of org.apache.spark.mllib.tree.model.RandomForestModel model.featureImportances 
+7
scala random-forest apache-spark apache-spark-mllib
source share
1 answer

You must use new random forests. Check import. OLD:

 import org.apache.spark.mllib.tree.RandomForest import org.apache.spark.mllib.tree.model.RandomForestModel 

New random forests use:

 import org.apache.spark.ml.classification.RandomForestClassificationModel import org.apache.spark.ml.classification.RandomForestClassifier 
+3
source share

All Articles