How to visualize / build a decision tree in Apache Spark (PySpark 1.4.1)?

I use Apache Spark Mllib 1.4.1 (PySpark, a python implementation of Spark) to generate a decision tree based on my LabeledPoint data. The tree is generated correctly, and I can print it on the terminal (extract the rules, as this user calls it How to extract the rules from the MLlib spark of the decision tree ) using:

model = DecisionTree.trainClassifier( ... ) print(model.toDebugString() 

But I want to visualize or build a decision tree, and not print it on the terminal. Is it possible to somehow build a decision tree in PySpark, or maybe I can save the data of the decision tree and use R to build them? Thanks!

+6
source share
3 answers

There is this Decision-Tree-Visualization-Spark project for visualizing a decision tree model

Has two steps

  • Parse the output of the Spark decision tree to JSON format .
  • Use the JSON file as input for visualizing D3.js.

For the parser, check out Dt.py

def tree_json(tree) for the def tree_json(tree) function are your models toDebugString()

Answer to the question

+4
source

I also use data blocks, and for some reason Gaphviz binaries do not work. I tried to give him the path where the library is installed, but that did not work. Is there any work around this?

0
source

Although this is a little old post, just to give my answer so that others coming to this post from now on can win.

Alternatively, you can use the graphviz python package for use in PySpark. It will print the decision tree model into a neat tree structure, rather than a regular one if the cycle structure.

More information can be found at this link: https://pypi.python.org/pypi/graphviz

-1
source

All Articles