The documentation for Random Forests does not include the importance function. However, it is listed on Jira as authorized and is located in the source code . HERE also says: "The main differences between this API and the original MLlib API:
- Support for DataFrames and ML Pipelines
- separation of classification against regression
- using DataFrame metadata to distinguish between continuous and categorical functions
- more functionality for random forests: estimates of function importance , as well as the predicted probability of each class (conditional probabilities of class aka) for classification.
However, I cannot understand the syntax that works to call this new function.
scala> model res13: org.apache.spark.mllib.tree.model.RandomForestModel = TreeEnsembleModel classifier with 10 trees scala> model.featureImportances <console>:60: error: value featureImportances is not a member of org.apache.spark.mllib.tree.model.RandomForestModel model.featureImportances
scala random-forest apache-spark apache-spark-mllib
Climbs_lika_Spyder
source share