Python, PyDot, and DecisionTree

I am trying to visualize my DecisionTree, but I get the error Code:

X = [i[1:] for i in dataset]#attribute y = [i[0] for i in dataset] clf = tree.DecisionTreeClassifier() dot_data = StringIO() tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("tree.pdf") 

And mistake

 Traceback (most recent call last): if data.startswith(codecs.BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes 

Can someone explain to me what the problem is? Thank you very much!

+8
python decision-tree pydot
source share
3 answers

I had the exact same problem and spent a couple of hours trying to figure it out. I cannot guarantee that what I share here will work for others, but it may be worth it.

  • I tried installing the official pydot packages, but I have Python 3 and they just don't work. Having found a note in the stream from one of the many sites that I was browsing, I finished installing this forked pydot repository .
  • I went to graphviz.org and installed my software on my Windows 7 machine. If you do not have Windows, see the “Download” section for my system.
  • After successful installation in environment variables ( Control Panel\All Control Panel Items\System\Advanced system settings > click the Environment Variables button) in System variables I found the path variable> click Edit... >, adding ;C:\Program Files (x86)\Graphviz2.38\bin at the end in Variable value:
  • To confirm that I can now use dot commands on the command line (Windows Command Processor), I typed dot -V , which returned dot - graphviz version 2.38.0 (20140413.2041) .

In the code below, remember that I am reading a dataframe from my clipboard. Perhaps you are reading it from a file or whathaveyou.

In an IPython laptop :

 import pandas as pd import numpy as np from sklearn import tree import pydot from IPython.display import Image from sklearn.externals.six import StringIO df = pd.read_clipboard() X = df[df.columns[:-1]] y = df[df.columns[-1]] dtr = tree.DecisionTreeRegressor(max_depth=3) dtr.fit(X, y) dot_data = StringIO() tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns) graph = pydot.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png()) 

Decision tree visualization

Alternatively, if you are not using IPython, you can create your own image from the command line if you have graphviz installed (step 2 above). Using my previous code example, you use this line after installing the model:

 tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns) 

then open the command line where the treepic.dot file is located and enter this command line:

 dot -T png treepic.dot -o treepic.png 

The .png file must be created with your decision tree.

+4
source share

If using Python 3, use pydotplus instead of pydot . It will also have a mild protocol installation process.

 import pydotplus <your code> dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("iris.pdf") 
+6
source share

The line in question checks whether the stream / file is encoded as UTF-8

Instead:

 if data.startswith(codecs.BOM_UTF8): 

using:

 if codecs.BOM_UTF8 in data: 

You will most likely have more success ...

0
source share

All Articles