Understanding num_classes for xgboost in R

Question

Understanding num_classes for xgboost in R

I have a lot of problems when I figured out how to set num_classes for xgboost correctly.

I have an example of using Iris data

df <- iris y <- df$Species num.class = length(levels(y)) levels(y) = 1:num.class head(y) df <- df[,1:4] y <- as.matrix(y) df <- as.matrix(df) param <- list("objective" = "multi:softprob", "num_class" = 3, "eval_metric" = "mlogloss", "nthread" = 8, "max_depth" = 16, "eta" = 0.3, "gamma" = 0, "subsample" = 1, "colsample_bytree" = 1, "min_child_weight" = 12) model <- xgboost(param=param, data=df, label=y, nrounds=20)

This returns an error

 Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) : SoftmaxMultiClassObj: label must be in [0, num_class), num_class=3 but found 3 in label

If I change num_class to 2, I get the same error. If I increase num_class to 4, the model will work, but I will get 600 predicted probabilities, which makes sense for 4 classes.

I am not sure if I am making a mistake or I do not understand how xgboost works. Any help would be greatly appreciated.

+8

r xgboost

House Mar 18 '16 at 14:07

source share

2 answers

Rustama · Answer 1 · 2016-03-18T15:13:49+0000

Label

should be in [0, num_class] in the script add y<-y-1 to model <-...

Hack-r · Answer 2 · 2017-05-10T20:14:47+0000

I ran into this rather strange problem. In my class, it seemed to be the result of incorrect label encoding .

Firstly, using a string vector with N classes as labels, I could get the algorithm to run by setting num_class = N + 1. However, this result was useless because I only had N real classes and N +1 buckets predicted probabilities.

I encoded the labels as integers a nd, and then num_class worked fine when set to N.

 # Convert classes to integers for xgboost class <- data.table(interest_level=c("low", "medium", "high"), class=c(0,1,2)) t1 <- merge(t1, class, by="interest_level", all.x=TRUE, sort=F)

and

 param <- list(booster="gbtree", objective="multi:softprob", eval_metric="mlogloss", #nthread=13, num_class=3, eta_decay = .99, eta = .005, gamma = 1, max_depth = 4, min_child_weight = .9,#1, subsample = .7, colsample_bytree = .5 )

For example.

Understanding num_classes for xgboost in R

More articles: