The adabag boost function causes an error when providing mfinal> 10

I have a strange problem, when I try to increase the mfinal argument in the adabag package boost function beyond 10, I get an error. Even with mfinal = 9, I get warnings.

My train data has 7 dependent class variables and 100 independent variables and about 22,000 data samples (Smoted one class using DMwR). My dependent variable is at the end of the training data set in sequence.

library(adabag) gc() exp_recog_boo <- boosting(V1 ~ .,data=train_dataS,boos=TRUE,mfinal=9) Error in 1:nrow(object$splits) : argument of length 0 In addition: Warning messages: 1: In acum + acum1 : longer object length is not a multiple of shorter object length 

Thanks in advance.

+4
source share
6 answers

My mistake was that I had not set the value to TARGET before.

Try the following:

 train$target <- as.factor(train$target) 

and check:

 str(train$TARGET) 
+5
source

This worked for me:

 modelADA <- boosting(lettr ~ ., data = trainAll, boos = TRUE, mfinal = 10, control = (minsplit = 0)) 

Essentially, I just told rpart to require a minimum zero separation length to generate the tree, it fixed the error. I have not tested this extensively, so I cannot guarantee its correct solution (what does the tree with a zero leaf of length really mean?), But this prevents the error from being thrown.

+2
source

I think I ran into a problem.

ignore this - if you customize your control with cp = 0, this will not happen. I think that if the first node of the tree does not improve (or at least not better than cp), the tree will remain with nodes 0, so you will have an empty tree, and this will lead to the algorithm crashing.

EDIT: the problem is that rpart generates trees with only one leaf (node), and the boosting method uses this clause "k <- varImp (arboles [[m]], surrogates = FALSE, competes = FALSE), being arbols [ [m]] a tree with only one node, it gives you eror.

To solve this problem, you can change the enhancement method:

Write: fix (increase) and add * lines.

 if (boos == TRUE) { ** k <- 1 ** while (k == 1){ boostrap <- sample(1:n, replace = TRUE, prob = pesos) fit <- rpart(formula, data = data[boostrap, -1], control = control) ** k <- length(fit$frame$var) ** } flearn <- predict(fit, newdata = data[, -1], type = "class") ind <- as.numeric(vardep != flearn) err <- sum(pesos * ind) } 

this will prevent the algorithm from using a single tree of trees, but you must set the CP from the control parameter to 0 to avoid an infinite loop.

+1
source

Just ran into the same problem and set the complexity parameter -1 or the minimum split to 0, both work for me using rpart.control, for example.

 library(adabag) r1 <- boosting(Y ~ ., data = data, boos = TRUE, mfinal = 10, control = rpart.control(cp = -1)) r2 <- boosting(Y ~ ., data = data, boos = TRUE, mfinal = 10, control = rpart.control(minsplit = 0)) 
+1
source

I also ran into this problem recently and this R script example completely solves!

The basic idea is that you need to install a control for rpart (which adabag uses to create trees, see rpart.control) respectively, so that at least a split attempt is made in each tree.

I'm not quite sure, but it seems that your โ€œargument of length 0โ€ may be the result of an empty tree, which may happen because there is a default parameter for the complexity parameter that tells the function not to try to break if the decrease in uniformity / lack of correspondence is below a certain the threshold.

0
source

use str () to see the attributes of your data frame. For me, I just convert the myclass variable as a coefficient, then everything is done.

0
source

All Articles