What is the mechanism for using param 'scale_pos_weight' in xgboost?

Question

What is the mechanism for using param 'scale_pos_weight' in xgboost?

In my dataset, there are 90% negative samples and 10% positive samples, which are very unbalanced. I am trying to use the scale_pos_weight parameter and set it to 9. What is the mechanism of this parameter. I am wondering what this actually means: does this mean repeating positive patterns 9 times? Or, take out 1/9 samples of negative samples each time and train the model many times. Also, if I have a dataset whose negative samples are slightly larger than positive, do I need to specify the parameter again?

+6

xgboost

yanachen Jun 20 '17 at 8:46

source share

1 answer

data princess · Answer 1 · 2017-10-12T14:00:22+0000

I have never seen anywhere in the documentation that explicitly indicates what this option does. Nevertheless, I am quite sure that this is the last one, i.e. He builds trees based on 1/9 negative samples. Although both should have about the same effect if the data is good, accepting a subset of negatives is an agreement for modeling, since it makes cross-validation easier, since you now have 9 training sets that you can test against each other.

As a side note, I would not assume that the 90/10 division is so unbalanced. This is much better than in many situations, and there is a discussion about whether rebalancing always helps.

What is the mechanism for using param 'scale_pos_weight' in xgboost?

More articles: