Sklearn: Is there a way to debug Pipelines?

I have created several pipelines for the classification problem, and I want to check what information is present / stored at each stage (for example, text_stats, ngram_tfidf). How can i do this.

pipeline = Pipeline([ ('features',FeatureUnion([ ('text_stats', Pipeline([ ('length',TextStats()), ('vect', DictVectorizer()) ])), ('ngram_tfidf',Pipeline([ ('count_vect', CountVectorizer(tokenizer=tokenize_bigram_stem,stop_words=stopwords)), ('tfidf', TfidfTransformer()) ])) ])), ('classifier',MultinomialNB(alpha=0.1)) ]) 
+6
source share
2 answers

I find it sometimes useful to temporarily add a debugging step that displays the information you are interested in. Building on top of the example from sklearn 1 , you can do this, for example, to print the first 5 lines, shapes, or anything you need to see before the classifier is called:

 from sklearn import svm from sklearn.datasets import samples_generator from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import f_regression from sklearn.pipeline import Pipeline from sklearn.base import TransformerMixin, BaseEstimator class Debug(BaseEstimator, TransformerMixin): def transform(self, X): print(pd.DataFrame(X).head()) print(X.shape) return X def fit(self, X, y=None, **fit_params): return self X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42) anova_filter = SelectKBest(f_regression, k=5) clf = svm.SVC(kernel='linear') anova_svm = Pipeline([('anova', anova_filter), ('dbg', Debug()), ('svc', clf)]) anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y) prediction = anova_svm.predict(X) 
+2
source

You can traverse the Pipeline () tree using the steps and named_steps . The first is a list of tuples ('step_name', Step()) , while the latter gives you a dictionary built from this list

FeatureUnion () can be explored in the same way using the transformer_list attribute

0
source

All Articles