I use a recursive function in my sklearn pipeline, the pipeline looks something like this:
from sklearn.pipeline import FeatureUnion, Pipeline from sklearn import feature_selection from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import LinearSVC X = ['I am a sentence', 'an example'] Y = [1, 2] X_dev = ['another sentence']
How can I get function names selected by RFE? The RFE should select the top 500 features, but I really need to take a look at which features were selected.
EDIT:
I have a complex Pipeline, which consists of several pipelines and functional associations, the choice of the percentile function and in the end. Recursive function:
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=90) fs_vect = feature_selection.SelectPercentile(feature_selection.chi2, percentile=80) f5 = feature_selection.RFE(estimator=svc, n_features_to_select=600, step=3) countVecWord = TfidfVectorizer(ngram_range=(1, 3), max_features=2000, analyzer=u'word', sublinear_tf=True, use_idf = True, min_df=2, max_df=0.85, lowercase = True) countVecWord_tags = TfidfVectorizer(ngram_range=(1, 4), max_features= 1000, analyzer=u'word', min_df=2, max_df=0.85, sublinear_tf=True, use_idf = True, lowercase = False) pipeline = Pipeline([ ('union', FeatureUnion( transformer_list=[ ('vectorized_pipeline', Pipeline([ ('union_vectorizer', FeatureUnion([ ('stem_text', Pipeline([ ('selector', ItemSelector(key='stem_text')), ('stem_tfidf', countVecWord) ])), ('pos_text', Pipeline([ ('selector', ItemSelector(key='pos_text')), ('pos_tfidf', countVecWord_tags) ])), ])), ('percentile_feature_selection', fs_vect) ])), ('custom_pipeline', Pipeline([ ('custom_features', FeatureUnion([ ('pos_cluster', Pipeline([ ('selector', ItemSelector(key='pos_text')), ('pos_cluster_inner', pos_cluster) ])), ('stylistic_features', Pipeline([ ('selector', ItemSelector(key='raw_text')), ('stylistic_features_inner', stylistic_features) ])), ])), ('percentile_feature_selection', fs), ('inner_scale', inner_scaler) ])), ],
I will try to explain the steps. The first Pipeline consists of vectorizers and is called "vectorized_pipeline", all of them have the function "get_feature_names". The second Pipeline consists of my own functions, I implemented them with the fit, transform and get_feature_names functions. When I use the @Kevin clause, I get an error that "union" (which is the name of my top element in the pipeline) does not have the get_feature_names function:
support = pipeline.named_steps['rfe_feature_selection'].support_ feature_names = pipeline.named_steps['union'].get_feature_names() print np.array(feature_names)[support]
Also, when I try to get function names from individual FeatureUnions, for example:
support = pipeline.named_steps['rfe_feature_selection'].support_ feature_names = pipeline_age.named_steps['union_vectorizer'].get_feature_names() print np.array(feature_names)[support]
I get a key error:
feature_names = pipeline.named_steps['union_vectorizer'].get_feature_names() KeyError: 'union_vectorizer'