XGBoost plot_importance does not show function names

Question

XGBoost plot_importance does not show function names

I am using XGBoost with Python and have successfully trained the model using an XGBoost function train()called data DMatrix. The matrix was created from the Pandas frame, which has function names for the columns.

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)

model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
                  early_stopping_rounds=50, maximize=False, verbose_eval=10)

fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)

Now I want to see the importance of the function with the function xgboost.plot_importance(), but as a result of this graph, the names of the functions are not displayed. Instead, the functions are listed as f1, f2, f3, etc., as shown below.

I think the problem is that I converted the original Pandas data frame to DMatrix. How can I relate the names of objects so that they are displayed in a graph of the importance of the function?

+15

python pandas machine-learning xgboost

stackoverflowuser2010 Oct 25 '17 at 23:01

5

train_test_split dataframe numpy, .

, @piRSquared, DMatrix. numpy, train_test_split Dataframe, .

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)

# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)

dtrain = xgb.DMatrix(Xtrain, label=ytrain)

+7

Vivek Kumar 26 . '17 13:48

scikit-learn, XGBoost Booster scikit, :

model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())

+4

Darrrrrren 12 . '19 12:42

, , feature_names. , , XGBoost v0.80, .

## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
    f.write('\n'.join(model.feature_names))

## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")

with open("foo_fnames.txt", "r") as f:
    feature_names2 = f.read().split("\n")

model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)

Thus, it saves feature_namesand adds them separately later. For some reason, it is feature_typesalso necessary to initialize, even if the value is equal None.

+1

Peter VanderMeer Dec 18 '18 at 16:04

source share

With the Scikit-Learn Wrapper "XGBClassifier" interface, plot_importance returns the "Axis matplotlib" class. So we can use axes.set_yticklabels.

plot_importance(model).set_yticklabels(['feature1','feature2'])

0

Vincent mk Jul 9 '19 at 3:58

source share

piRSquared · Accepted Answer · 2017-10-25T23:13:06+0000

feature_names xgb.DMatrix

dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)

XGBoost plot_importance does not show function names

More articles: