The boundaries of drawing solutions in R

Question

The boundaries of drawing solutions in R

I have a series of simulated class labels from a knn function. I have a data frame with basic numerical training data and another data frame for test data. How could I draw a solution border for return values from the knn function? I will have to repeat my findings on a locked machine, so please limit the use of third-party libraries, if possible.

I have only two class labels: orange and blue. They are built on simple 2D graphics with training data. Again, I just want to draw a border around the results of the knn function.

the code:

 library(class) n <- 100 set.seed(1) x <- round(runif(n, 1, n)) set.seed(2) y <- round(runif(n, 1, n)) train.df <- data.frame(x, y) set.seed(1) x.test <- round(runif(n, 1, n)) set.seed(2) y.test <- round(runif(n, 1, n)) test.df <- data.frame(x.test, y.test) k <- knn(train.df, test.df, classes, k=25) plot(test.df, col=k)

classes is simply a class label vector derived from earlier code.

If you need it, below is the full code of my work:

 library(class) n <- 100 set.seed(1) x <- round(runif(n, 1, n)) set.seed(2) y <- round(runif(n, 1, n)) # ============================================================ # Bayes Classifier + Decision Boundary Code # ============================================================ classes <- "null" colours <- "null" for (i in 1:n) { # P(C = j | X = x, Y = y) = prob # "The probability that the class (C) is orange (j) when X is some x, and Y is some y" # Two predictors that influence classification: x, y # If x and y are both under 50, there is a 90% chance of being orange (grouping) # If x and y and both over 50, or if one of them is over 50, grouping is blue # Algorithm favours whichever grouping has a higher chance of success, then plots using that colour # When prob (from above) is 50%, the boundary is drawn percentChance <- 0 if (x[i] < 50 && y[i] < 50) { # 95% chance of orange and 5% chance of blue # Bayes Decision Boundary therefore assigns to orange when x < 50 and y < 50 # "colours" is the Decision Boundary grouping, not the plotted grouping percentChance <- 95 colours[i] <- "orange" } else { percentChance <- 10 colours[i] <- "blue" } if (round(runif(1, 1, 100)) > percentChance) { classes[i] <- "blue" } else { classes[i] <- "orange" } } boundary.x <- seq(0, 100, by=1) boundary.y <- 0 for (i in 1:101) { if (i > 49) { boundary.y[i] <- -10 # just for the sake of visual consistency, real value is 0 } else { boundary.y[i] <- 50 } } df <- data.frame(boundary.x, boundary.y) plot(x, y, col=classes) lines(df, type="l", lty=2, lwd=2, col="red") # ============================================================ # K-Nearest neighbour code # ============================================================ #library(class) #n <- 100 #set.seed(1) #x <- round(runif(n, 1, n)) #set.seed(2) #y <- round(runif(n, 1, n)) train.df <- data.frame(x, y) set.seed(1) x.test <- round(runif(n, 1, n)) set.seed(2) y.test <- round(runif(n, 1, n)) test.df <- data.frame(x.test, y.test) k <- knn(train.df, test.df, classes, k=25) plot(test.df, col=k)

+2

r machine-learning nearest-neighbor

Kingdan Oct 2 '16 at 22:44

source share

1 answer

Hong ooi · Accepted Answer · 2016-10-03T14:28:34+0000

Get class probability predictions in the grid and draw a contour line at P = 0.5 (or whatever you want the cutoff point to be). This is also the method used in the classic MASS tutorial from Venables and Ripley, and in Elements of Statistical Learning from Hastie, Tibshirani, and Friedman.

 # class labels: simple distance from origin classes <- ifelse(x^2 + y^2 > 60^2, "blue", "orange") classes.test <- ifelse(x.test^2 + y.test^2 > 60^2, "blue", "orange") grid <- expand.grid(x=1:100, y=1:100) classes.grid <- knn(train.df, grid, classes, k=25, prob=TRUE) # note last argument prob.grid <- attr(classes.grid, "prob") prob.grid <- ifelse(classes.grid == "blue", prob.grid, 1 - prob.grid) # plot the boundary contour(x=1:100, y=1:100, z=matrix(prob.grid, nrow=100), levels=0.5, col="grey", drawlabels=FALSE, lwd=2) # add points from test dataset points(test.df, col=classes.test)

See also basically the same question about CrossValidated.

The boundaries of drawing solutions in R

More articles: