R prediction carriage returns less output than input

I used caret to train the rpart model below.

 trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE) dtrain <- d[trainIndex, ] dtest <- d[-trainIndex, ] fitControl <- trainControl(## 10-fold CV method = "repeatedcv", number=10, repeats=10) fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart", trControl = fitControl) testRpart <- predict(fitRpart, newdata=dtest) 

dtest contains 1296 observations, so I expected testRpart create a testRpart vector. Instead, it lasts 1077, i.e. 219.

When I started the prediction in the first 220 lines of dtest , I got the predicted result 1, so it will be consistently short.

Any explanation why this is so, and what can I do to get a consistent output to the input?

Edit: d can be downloaded from here to reproduce above.

+5
source share
2 answers

I uploaded your data and found that explains the discrepancy.

If you simply remove the missing values ​​from your dataset, the output length will correspond to:

 testRpart <- predict(fitRpart, newdata = na.omit(dtest)) 

Note nrow(na.omit(dtest)) is 1103, and length(testRpart) is 1103. So you need a strategy to eliminate missing values. See ?predict.rpart and the parameters of the na.action parameter to choose what you want.

+11
source

I had a similar problem using "newx" instead of "newdata" in the prediction function. Using "newdata" (or nothing) will solve my problem, hope this helps someone else who has used newx and had the same problem.

0
source

All Articles