Python scikit export learn models in pmml

I want to export python scikit-learn models in PMML.

Which python package works best?

I read about Augustus , but I could not find any example using scikit-learn models.

+16
python scikit-learn pmml
source share
3 answers

SkLearn2PMML

thin shell around JPMML-SkLearn command line application. For a list of supported Scikit-Learn Estimator and Transformer types, see the JPMML-SkLearn project documentation.

As @ user1808924 notes, it supports Python 2.7 or 3.4+. It also requires Java 1.7+

Installed via: ( git required)

 pip install git+https://github.com/jpmml/sklearn2pmml.git 

An example of how to export the classifier tree in PMML. First grow a tree:

 # example tree & viz from http://scikit-learn.org/stable/modules/tree.html from sklearn import datasets, tree iris = datasets.load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) 

There are two parts to the SkLearn2PMML conversion, the evaluator (our clf ) and the cartographer (for preprocessing steps such as discretization or PCA). Our cartographer is quite simple, because we do not make any transformations.

 from sklearn_pandas import DataFrameMapper default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']]) from sklearn2pmml import sklearn2pmml sklearn2pmml(estimator=clf, mapper=default_mapper, pmml="D:/workspace/IrisClassificationTree.pmml") 

It is possible (although not documented) to pass mapper=None , but you will see that the names of the predictors are lost (return x1 not sepal length , etc.).

Take a look at the .pmml file:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3"> <Header> <Application name="JPMML-SkLearn" version="1.1.1"/> <Timestamp>2016-09-26T19:21:43Z</Timestamp> </Header> <DataDictionary> <DataField name="sepal length (cm)" optype="continuous" dataType="float"/> <DataField name="sepal width (cm)" optype="continuous" dataType="float"/> <DataField name="petal length (cm)" optype="continuous" dataType="float"/> <DataField name="petal width (cm)" optype="continuous" dataType="float"/> <DataField name="Species" optype="categorical" dataType="string"> <Value value="setosa"/> <Value value="versicolor"/> <Value value="virginica"/> </DataField> </DataDictionary> <TreeModel functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="Species" usageType="target"/> <MiningField name="sepal length (cm)"/> <MiningField name="sepal width (cm)"/> <MiningField name="petal length (cm)"/> <MiningField name="petal width (cm)"/> </MiningSchema> <Output> <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/> <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/> <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/> </Output> <Node id="1"> <True/> <Node id="2" score="setosa" recordCount="50.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/> <ScoreDistribution value="setosa" recordCount="50.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="3"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/> <Node id="4"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/> <Node id="5"> <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/> <Node id="6" score="versicolor" recordCount="47.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="47.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="7" score="virginica" recordCount="1.0"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="1.0"/> </Node> </Node> <Node id="8"> <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/> <Node id="9" score="virginica" recordCount="3.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="3.0"/> </Node> <Node id="10"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/> <Node id="11" score="versicolor" recordCount="2.0"> <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="2.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="12" score="virginica" recordCount="1.0"> <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="1.0"/> </Node> </Node> </Node> </Node> <Node id="13"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/> <Node id="14"> <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/> <Node id="15" score="virginica" recordCount="2.0"> <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="2.0"/> </Node> <Node id="16" score="versicolor" recordCount="1.0"> <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="1.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> </Node> <Node id="17" score="virginica" recordCount="43.0"> <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="43.0"/> </Node> </Node> </Node> </Node> </TreeModel> </PMML> 

The first split (Node 1) is at a width of 0.8 petals. Node 2 (lobe width <= 0.8) captures the entire setose, and nothing more.

You can compare pmml output with graphviz output:

 from sklearn.externals.six import StringIO import pydotplus # this might be pydot for python 2.7 dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, special_characters=True) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("D:/workspace/iris.pdf") # for in-line display, you can also do: # from IPython.display import Image # Image(graph.create_png()) 

enter image description here

+12
source share

Feel free to try Nyoka. Export SKL models, and then some.

+2
source share

Nyoka python library with support for Scikit-learn , XGBoost , LightGBM , Keras and Statsmodels .

In addition to approximately 500 Python classes, each of which covers the PMML tag and all constructor parameters / attributes, as defined in the standard, Nyoka also provides a growing number of handy classes and functions that make Data Scientists life easier, for example, by reading or writing any PMML file. in one line of code in your favorite Python environment.

It can be installed from PyPi using:

 pip install nyoka 

Code example

Example 1

 import pandas as pd from sklearn import datasets from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler, Imputer from sklearn_pandas import DataFrameMapper from sklearn.ensemble import RandomForestClassifier iris = datasets.load_iris() irisd = pd.DataFrame(iris.data, columns=iris.feature_names) irisd['Species'] = iris.target features = irisd.columns.drop('Species') target = 'Species' pipeline_obj = Pipeline([ ("mapping", DataFrameMapper([ (['sepal length (cm)', 'sepal width (cm)'], StandardScaler()) , (['petal length (cm)', 'petal width (cm)'], Imputer()) ])), ("rfc", RandomForestClassifier(n_estimators = 100)) ]) pipeline_obj.fit(irisd[features], irisd[target]) from nyoka import skl_to_pmml skl_to_pmml(pipeline_obj, features, target, "rf_pmml.pmml") 

Example 2

 from keras import applications from keras.layers import Flatten, Dense from keras.models import Model model = applications.MobileNet(weights='imagenet', include_top=False,input_shape = (224, 224,3)) activType='sigmoid' x = model.output x = Flatten()(x) x = Dense(1024, activation="relu")(x) predictions = Dense(2, activation=activType)(x) model_final = Model(inputs =model.input, outputs = predictions,name='predictions') from nyoka import KerasToPmml cnn_pmml = KerasToPmml(model_final,dataSet='image',predictedClasses=['cats','dogs']) cnn_pmml.export(open('2classMBNet.pmml', "w"), 0) 

More examples can be found on the Nyoka Github page .

0
source share

All Articles