A variation on the topic "How to construct the boundary of the solution of the classifier of the k-nearest neighbor from the elements of statistical training?"

Question

A variation on the topic "How to construct the boundary of the solution of the classifier of the k-nearest neighbor from the elements of statistical training?"

This is a question related to https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-ak-nearest-neighbor-classifier-from-elements-o

For completeness, here is the original example from this link:

library(ElemStatLearn) require(class) x <- mixture.example$x g <- mixture.example$y xnew <- mixture.example$xnew mod15 <- knn(x, xnew, g, k=15, prob=TRUE) prob <- attr(mod15, "prob") prob <- ifelse(mod15=="1", prob, 1-prob) px1 <- mixture.example$px1 px2 <- mixture.example$px2 prob15 <- matrix(prob, length(px1), length(px2)) par(mar=rep(2,4)) contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main= "15-nearest neighbour", axes=FALSE) points(x, col=ifelse(g==1, "coral", "cornflowerblue")) gd <- expand.grid(x=px1, y=px2) points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue")) box()

I played with this example and would like to try to get it to work with three classes. I can change some g values with something like

 g[8:16] <- 2

just to pretend that there are some patterns that belong to the third class. I can't get the plot to work. I think I need to change the lines that relate to the percentage of votes for the class:

 prob <- attr(mod15, "prob") prob <- ifelse(mod15=="1", prob, 1-prob)

as well as levels on the circuit:

 contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main= "15-nearest neighbour", axes=FALSE)

I'm also not sure if the outline is the right tool for this. One of the options that works is to create a data matrix that covers the region of interest to me, classifies each point of this matrix and outlines those with a large marker and different colors, similar to what is done with dots (gd ...) bits .

The ultimate goal is to show the different boundaries of decision-making created by different classifiers. Can someone point me in the right direction?

thanks Raphael

+5

r cluster-analysis nearest-neighbor visualization

Rafael santos Jul 05 '15 at 20:20

source share

1 answer

lgautier · Accepted Answer · 2015-07-06T00:21:07+0000

Separation of the main parts of the code will help determine how to do this:

Test data with three classes

  train <- rbind(iris3[1:25,1:2,1], iris3[1:25,1:2,2], iris3[1:25,1:2,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))

Grid Cover Test Data

  require(MASS) test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1), by=0.1), y=seq(min(train[,2]-1), max(train[,2]+1), by=0.1))

Classification for this grid

3 classes obviously

  require(class) classif <- knn(train, test, cl, k = 3, prob=TRUE) prob <- attr(classif, "prob")

Data structure to build

  require(dplyr) dataf <- bind_rows(mutate(test, prob=prob, cls="c", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="v", prob_cls=ifelse(classif==cls, 1, 0)), mutate(test, prob=prob, cls="s", prob_cls=ifelse(classif==cls, 1, 0)))

land

  require(ggplot2) ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls), data = mutate(test, cls=classif), size=1.2) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl))

We can also be a little more interesting and talk about the likelihood of class membership as a sign of "confidence."

  ggplot(dataf) + geom_point(aes(x=x, y=y, col=cls, size=prob), data = mutate(test, cls=classif)) + scale_size(range=c(0.8, 2)) + geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls), bins=2, data=dataf) + geom_point(aes(x=x, y=y, col=cls), size=3, data=data.frame(x=train[,1], y=train[,2], cls=cl)) + geom_point(aes(x=x, y=y), size=3, shape=1, data=data.frame(x=train[,1], y=train[,2], cls=cl))

A variation on the topic "How to construct the boundary of the solution of the classifier of the k-nearest neighbor from the elements of statistical training?"

More articles: