R gbm logistic regression

I was hoping to use the GBM package to perform logistic regression, but it gives answers a little outside the range of 0-1. I tried the suggested distribution parameters for forecasts 0-1 ( bernoulli and adaboost ), but this actually makes things worse than using gaussian .

 GBM_NTREES = 150 GBM_SHRINKAGE = 0.1 GBM_DEPTH = 4 GBM_MINOBS = 50 > GBM_model <- gbm.fit( + x = trainDescr + ,y = trainClass + ,distribution = "gaussian" + ,n.trees = GBM_NTREES + ,shrinkage = GBM_SHRINKAGE + ,interaction.depth = GBM_DEPTH + ,n.minobsinnode = GBM_MINOBS + ,verbose = TRUE) Iter TrainDeviance ValidDeviance StepSize Improve 1 0.0603 nan 0.1000 0.0019 2 0.0588 nan 0.1000 0.0016 3 0.0575 nan 0.1000 0.0013 4 0.0563 nan 0.1000 0.0011 5 0.0553 nan 0.1000 0.0010 6 0.0546 nan 0.1000 0.0008 7 0.0539 nan 0.1000 0.0007 8 0.0533 nan 0.1000 0.0006 9 0.0528 nan 0.1000 0.0005 10 0.0524 nan 0.1000 0.0004 100 0.0484 nan 0.1000 0.0000 150 0.0481 nan 0.1000 -0.0000 > prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -0.02945224 1.00706700 

Bernoulli:

 GBM_model <- gbm.fit( x = trainDescr ,y = trainClass ,distribution = "bernoulli" ,n.trees = GBM_NTREES ,shrinkage = GBM_SHRINKAGE ,interaction.depth = GBM_DEPTH ,n.minobsinnode = GBM_MINOBS ,verbose = TRUE) prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -4.699324 3.043440 

And adaboost:

 GBM_model <- gbm.fit( x = trainDescr ,y = trainClass ,distribution = "adaboost" ,n.trees = GBM_NTREES ,shrinkage = GBM_SHRINKAGE ,interaction.depth = GBM_DEPTH ,n.minobsinnode = GBM_MINOBS ,verbose = TRUE) > prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -3.0374228 0.9323279 

I'm doing something wrong, I need to pre-process (scale, center) the data, or I need to go in and manually set / close the values ​​using something like:

 prediction <- ifelse(prediction < 0, 0, prediction) prediction <- ifelse(prediction > 1, 1, prediction) 
+7
source share
1 answer

From ?predict.gbm :

Returns the forecast vector. By default, predictions are on f (x) scale. For example, for Bernoulli loss, the return value is on the logarithmic logarithm scale, poisson loss on the logarithmic scale and coxph is on the log danger scale.

If type = "response", then gbm is converted back to the same scale as the result. Currently, the effect that will have is the return of probabilities for Bernoulli and expected values ​​for Poissons. For other distributions, the "response" and "link" return the same.

So, if you use distribution="bernoulli" , you need to convert the predicted values ​​to rescale them to [0, 1]: p <- plogis(predict.gbm(model)) . Using distribution="gaussian" valid for regression and not for classification, although I am surprised that the predictions are not in [0, 1]: I understand that gbm is still tree-based, so the predicted values ​​should not be able to come out beyond the values ​​present in the training data.

+14
source

All Articles