R gbm logistic regression

Question

R gbm logistic regression

I was hoping to use the GBM package to perform logistic regression, but it gives answers a little outside the range of 0-1. I tried the suggested distribution parameters for forecasts 0-1 ( bernoulli and adaboost ), but this actually makes things worse than using gaussian .

 GBM_NTREES = 150 GBM_SHRINKAGE = 0.1 GBM_DEPTH = 4 GBM_MINOBS = 50 > GBM_model <- gbm.fit( + x = trainDescr + ,y = trainClass + ,distribution = "gaussian" + ,n.trees = GBM_NTREES + ,shrinkage = GBM_SHRINKAGE + ,interaction.depth = GBM_DEPTH + ,n.minobsinnode = GBM_MINOBS + ,verbose = TRUE) Iter TrainDeviance ValidDeviance StepSize Improve 1 0.0603 nan 0.1000 0.0019 2 0.0588 nan 0.1000 0.0016 3 0.0575 nan 0.1000 0.0013 4 0.0563 nan 0.1000 0.0011 5 0.0553 nan 0.1000 0.0010 6 0.0546 nan 0.1000 0.0008 7 0.0539 nan 0.1000 0.0007 8 0.0533 nan 0.1000 0.0006 9 0.0528 nan 0.1000 0.0005 10 0.0524 nan 0.1000 0.0004 100 0.0484 nan 0.1000 0.0000 150 0.0481 nan 0.1000 -0.0000 > prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -0.02945224 1.00706700

Bernoulli:

 GBM_model <- gbm.fit( x = trainDescr ,y = trainClass ,distribution = "bernoulli" ,n.trees = GBM_NTREES ,shrinkage = GBM_SHRINKAGE ,interaction.depth = GBM_DEPTH ,n.minobsinnode = GBM_MINOBS ,verbose = TRUE) prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -4.699324 3.043440

And adaboost:

 GBM_model <- gbm.fit( x = trainDescr ,y = trainClass ,distribution = "adaboost" ,n.trees = GBM_NTREES ,shrinkage = GBM_SHRINKAGE ,interaction.depth = GBM_DEPTH ,n.minobsinnode = GBM_MINOBS ,verbose = TRUE) > prediction <- predict.gbm(object = GBM_model + ,newdata = testDescr + ,GBM_NTREES) > hist(prediction) > range(prediction) [1] -3.0374228 0.9323279

I'm doing something wrong, I need to pre-process (scale, center) the data, or I need to go in and manually set / close the values using something like:

 prediction <- ifelse(prediction < 0, 0, prediction) prediction <- ifelse(prediction > 1, 1, prediction)

+7

r

screechOwl Dec 7 '11 at 5:32

source share

1 answer

Hong ooi · Accepted Answer · 2011-12-07T07:50:39+0000

From ?predict.gbm :

Returns the forecast vector. By default, predictions are on f (x) scale. For example, for Bernoulli loss, the return value is on the logarithmic logarithm scale, poisson loss on the logarithmic scale and coxph is on the log danger scale.
If type = "response", then gbm is converted back to the same scale as the result. Currently, the effect that will have is the return of probabilities for Bernoulli and expected values for Poissons. For other distributions, the "response" and "link" return the same.

So, if you use distribution="bernoulli" , you need to convert the predicted values to rescale them to [0, 1]: p <- plogis(predict.gbm(model)) . Using distribution="gaussian" valid for regression and not for classification, although I am surprised that the predictions are not in [0, 1]: I understand that gbm is still tree-based, so the predicted values should not be able to come out beyond the values present in the training data.

R gbm logistic regression

More articles: