Using the k-fold cross-validation model using carton packaging

Let me start by reading a lot of cross-validation posts, and there seems to be a lot of confusion. My understanding is that it is just like this:

  1. Cross-validate in k-fold order, i.e. 10 times, to understand the average error of 10 times.
  2. If appropriate, train the model for a complete dataset.

I am trying to build a decision tree using rpart in R and taking advantage of the caret package. Below is the code I'm using.

 # load libraries library(caret) library(rpart) # define training control train_control<- trainControl(method="cv", number=10) # train the model model<- train(resp~., data=mydat, trControl=train_control, method="rpart") # make predictions predictions<- predict(model,mydat) # append predictions mydat<- cbind(mydat,predictions) # summarize results confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp) 

I have one question regarding a coach application. I read a brief introduction to the train section of the carriage package, which states that the “optimal set of parameters” is determined during the re-fetch process.

In my example, did I code this correctly? Do I need to define rpart parameters in my code or is my code enough?

+8
source share
3 answers

when you perform k-fold cross validation, you already make a prediction for each sample, more than 10 different models in total (assuming k = 10). There is no need to make a forecast for complete data, since you already have your forecasts from k different models.

What you can do is the following:

 train_control<- trainControl(method="cv", number=10, savePredictions = TRUE) 

Then

 model<- train(resp~., data=mydat, trControl=train_control, method="rpart") 

if you want to see observables and predictions in a good format, you simply type:

 model$pred 

Also for the second part of your question, the carriage should handle all the parameters. You can manually try to adjust the settings if you want.

+17
source

It is important to note that one should not confuse model selection and model error estimation.

You can use cross-validation to evaluate model hyperparameters (e.g. regularization parameter).

Usually this is done with a 10-fold check of cross-references, because it is a good choice for a compromise of bias offset (2 times can lead to high slope models, leaving one of the cv can lead to models with high dispersion / over-reinforcement) .

After that, if you do not have an independent test suite, you can evaluate the empirical distribution of some performance metric using cross-validation: once you find out about the best hyperparameters that you could use to evaluate cv error,

Please note that at this stage the hyperparameters are fixed, but perhaps the model parameters are different compared to the cross-validation models.

+4
source

On the first page of a short introductory document for a carriage package, it is mentioned that the optimal model is selected by parameters. As a starting point, you need to understand that cross-validation is the procedure for choosing the best approach to modeling, and not for the CV model itself - the final choice of model . Caret provides a grid search option with tuneGrid , where you can provide a list of parameter values ​​for testing. The final model will have an optimized parameter after completion of training.

+2
source

All Articles