How to create a solution boundary graph for kNN models in Caret?

Question

How to create a solution boundary graph for kNN models in Caret?

I would like to build a solution boundary for the model created by Caret. Ideally, I need a generic method for any Caret classifier model. However, I am currently working with the kNN method. I have included the code below, which uses the wine quality dataset from UCI that I am currently working with.

I found this method that works with the general kNN method in R, but cannot figure out how to map it to Caret -> https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of -ak-nearest-neighbor-classifier-from-elements-o / 21602 # 21602

library(caret) set.seed(300) wine.r <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep=';') wine.w <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', sep=';') wine.r$style <- "red" wine.w$style <- "white" wine <- rbind(wine.r, wine.w) wine$style <- as.factor(wine$style) formula <- as.formula(quality ~ .) dummies <- dummyVars(formula, data = wine) dummied <- data.frame(predict(dummies, newdata = wine)) dummied$quality <- wine$quality wine <- dummied numCols <- !colnames(wine) %in% c('quality', 'style.red', 'style.white') low <- wine$quality <= 6 high <- wine$quality > 6 wine$quality[low] = "low" wine$quality[high] = "high" wine$quality <- as.factor(wine$quality) indxTrain <- createDataPartition(y = wine[, names(wine) == "quality"], p = 0.7, list = F) train <- wine[indxTrain,] test <- wine[-indxTrain,] corrMat <- cor(train[, numCols]) correlated <- findCorrelation(corrMat, cutoff = 0.6) ctrl <- trainControl( method="repeatedcv", repeats=5, number=10, classProbs = T ) t1 <- train[, -correlated] grid <- expand.grid(.k = c(1:20)) knnModel <- train(formula, data = t1, method = 'knn', trControl = ctrl, tuneGrid = grid, preProcess = 'range' ) t2 <- test[, -correlated] knnPred <- predict(knnModel, newdata = t2) # How do I render the decision boundary?

+5

r machine-learning r-caret graphing

James kyle Sep 08 '15 at 4:28

source share

1 answer

chappers · Accepted Answer · 2015-09-08T06:30:19+0000

The first step is to understand what your code does! Indeed, you can create such a schedule without having anything to do with KNN.

For example, let's just have some sample data, where we simply “colorize” the bottom quadrant of your data.

Step 1

Create a grid. Basically, how the chart works, a point is created in each coordinate, so we know which group it belongs to. in R, this is done using expand.grid to go through all possible points.

 x1 <- 1:200 x2 <- 50:250 cgrid <- expand.grid(x1=x1, x2=x2) # our "prediction" colours the bottom left quadrant cgrid$prob <- 1 cgrid[cgrid$x1 < 100 & cgrid$x2 < 170, c("prob")] <- 0

If it were knn, then prob would be a prediction for this particular point.

Step 2

Now it’s relatively easy to outline. You must match the contour function, so first create a matrix with probabilities.

 matrix_val <- matrix(cgrid$prob, length(x1), length(x2))

Step 3

Then you can continue as the link did:

 contour(x1, x2, matrix_val, levels=0.5, labels="", xlab="", ylab="", main= "Some Picture", lwd=2, axes=FALSE) gd <- expand.grid(x=x1, y=x2) points(gd, pch=".", cex=1.2, col=ifelse(prob==1, "coral", "cornflowerblue")) box()

output:

So, back to your specific example. I'm going to use the iris because your data is not very interesting, but the same principle applies. To create a grid, you will need to select the xy axis and leave everything else fixed!

 knnModel <- train(Species ~., data = iris, method = 'knn') lgrid <- expand.grid(Petal.Length=seq(1, 5, by=0.1), Petal.Width=seq(0.1, 1.8, by=0.1), Sepal.Length = 5.4, Sepal.Width=3.1)

Then just use the prediction function, as you did above.

 knnPredGrid <- predict(knnModel, newdata=lgrid) knnPredGrid = as.numeric(knnPredGrid) # 1 2 3

And then plot the graph:

 pl = seq(1, 5, by=0.1) pw = seq(0.1, 1.8, by=0.1) probs <- matrix(knnPredGrid, length(pl), length(pw)) contour(pl, pw, probs, labels="", xlab="", ylab="", main= "X-nearest neighbour", axes=FALSE) gd <- expand.grid(x=pl, y=pw) points(gd, pch=".", cex=5, col=probs) box()

This should give the result as follows:

To add test / train results from your model, you can follow what I did. The only difference is you need to add the predicted points (this is not the same as the grid that was used to generate the border.

 library(caret) data(iris) indxTrain <- createDataPartition(y = iris[, names(iris) == "Species"], p = 0.7, list = F) train <- iris[indxTrain,] test <- iris[-indxTrain,] knnModel <- train(Species ~., data = train, method = 'knn') pl = seq(min(test$Petal.Length), max(test$Petal.Length), by=0.1) pw = seq(min(test$Petal.Width), max(test$Petal.Width), by=0.1) # generates the boundaries for your graph lgrid <- expand.grid(Petal.Length=pl, Petal.Width=pw, Sepal.Length = 5.4, Sepal.Width=3.1) knnPredGrid <- predict(knnModel, newdata=lgrid) knnPredGrid = as.numeric(knnPredGrid) # get the points from the test data... testPred <- predict(knnModel, newdata=test) testPred <- as.numeric(testPred) # this gets the points for the testPred... test$Pred <- testPred probs <- matrix(knnPredGrid, length(pl), length(pw)) contour(pl, pw, probs, labels="", xlab="", ylab="", main="X-Nearest Neighbor", axes=F) gd <- expand.grid(x=pl, y=pw) points(gd, pch=".", cex=5, col=probs) # add the test points to the graph points(test$Petal.Length, test$Petal.Width, col=test$Pred, cex=2) box()

Output:

Alternatively, you can use ggplot to make graphics that can be simpler:

 ggplot(data=lgrid) + stat_contour(aes(x=Petal.Length, y=Petal.Width, z=knnPredGrid), bins=2) + geom_point(aes(x=Petal.Length, y=Petal.Width, colour=as.factor(knnPredGrid))) + geom_point(data=test, aes(x=test$Petal.Length, y=test$Petal.Width, colour=as.factor(test$Pred)), size=5, alpha=0.5, shape=1)+ theme_bw()

Output:

How to create a solution boundary graph for kNN models in Caret?

More articles: