Find the K nearest neighbors starting with the distance matrix

I am looking for a well-optimized function that takes a matrix of distances n X n and returns a matrix n X k with indices k nearest neighbors of the i-th datapoint in the i-th row.

I find gazillion different R packages that allow you to do KNN, but they all seem to include distance calculations along with a sorting algorithm within the same function. In particular, for most procedures, the main argument is the original data matrix, not the distance matrix. In my case, I use non-standard distance for mixed types of variables, so I need to separate the sorting problem from the distance calculations.

This is not a completely frightening problem - I could obviously just use the order function inside the loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k , when k is small (less than 11), goes much faster, but, unfortunately, returns only the sorted values, and not the desired indexes.

+7
sorting matrix r knn distance
source share
2 answers

Try using the FastKNN CRAN package (although it is poorly documented). It offers the function k.nearest.neighbors , where an arbitrary distance matrix can be specified. Below is an example that calculates the matrix you need.

 # arbitrary data train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # nx 2 n = dim(train)[1] distMatrix <- matrix(runif(n^2,0,1),ncol=n) # nxn # matrix of neighbours k=3 nn = matrix(0,n,k) # nxk for (i in 1:n) nn[i,] = k.nearest.neighbors(i, distMatrix, k = k) 

Note. You can always check the list of Cran packages for Ctrl + F = 'knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html

+4
source share

For the record (I will not mark this as an answer), this is a quick and dirty solution. Suppose sd.dist is a special distance matrix. Suppose k.for.nn is the number of nearest neighbors.

 n = nrow(sd.dist) knn.mat = matrix(0, ncol = k.for.nn, nrow = n) knd.mat = knn.mat for(i in 1:n){ knn.mat[i,] = order(sd.dist[i,])[1:k.for.nn] knd.mat[i,] = sd.dist[i,knn.mat[i,]] } 

Now knn.mat is a matrix with indices of the nearest neighbors k in each row, and for the convenience of knd.mat , the corresponding distances are stored.

0
source share

All Articles