I ran into this rather strange problem. In my class, it seemed to be the result of incorrect label encoding .
Firstly, using a string vector with N classes as labels, I could get the algorithm to run by setting num_class = N + 1. However, this result was useless because I only had N real classes and N +1 buckets predicted probabilities.
I encoded the labels as integers a nd, and then num_class worked fine when set to N.
# Convert classes to integers for xgboost class <- data.table(interest_level=c("low", "medium", "high"), class=c(0,1,2)) t1 <- merge(t1, class, by="interest_level", all.x=TRUE, sort=F)
and
param <- list(booster="gbtree", objective="multi:softprob", eval_metric="mlogloss", #nthread=13, num_class=3, eta_decay = .99, eta = .005, gamma = 1, max_depth = 4, min_child_weight = .9,#1, subsample = .7, colsample_bytree = .5 )
For example.
Hack-r
source share