Computing the Euclidean distance between each row of a data block with all other rows in another data frame

I need to create a data frame with a minimum Euclidean distance between each row of the data block and all the other rows of the other frame. Both of my data files are large (about 40,000 lines). This is what I could work out so far.

x<-matrix(c(3,6,3,4,8),nrow=5,ncol=7,byrow = TRUE) y<-matrix(c(1,4,4,1,9),nrow=5,ncol=7,byrow = TRUE) sed.dist<-numeric(5) for (i in 1:(length(sed.dist))) { sed.dist[i]<-(sqrt(sum((y[i,1:7] - x[i,1:7])^2))) } 

But this only works when i = j. In fact, you need to find the minimum Euclidean distance, looping through each row one after the other (y [1,1: 7], then y [2,1: 7], etc. to i = 5) of the data frame "y" with all the lines "x" dataframe (x [i, 1: 7]). Each time he does this, I need to find the minimum Euclidean distance for each calculation of the row i of the y-data frame and all rows of the data frame x and save it in another data frame.

+7
loops for-loop r euclidean distance
source share
2 answers

Expanding my comment on the question, a fairly quick approach would be this: although with 40,000 lines you have to wait a bit, I think:

 unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2))))) #[1] 5.196152 5.385165 4.898979 4.898979 5.385165 

And comparative benchmarking:

 x = matrix(runif(1e2*5), 1e2) y = matrix(runif(1e2*5), 1e2) library(microbenchmark) alex = function() unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2))))) jlhoward = function() apply(y,1,function(y) min(apply(x,1,function(x,y)dist(rbind(x,y)),y))) all.equal(alex(), jlhoward()) #[1] TRUE microbenchmark(alex(), jlhoward(), times = 20) #Unit: milliseconds # expr min lq median uq max neval # alex() 3.369188 3.479011 3.600354 4.513114 4.789592 20 # jlhoward() 422.198621 431.565643 436.561057 442.643181 602.929742 20 
+3
source share

Try the following:

 apply(y,1,function(y) min(apply(x,1,function(x,y)dist(rbind(x,y)),y))) # [1] 5.196152 5.385165 4.898979 4.898979 5.385165 

Working from the inside, we bind string x to string y and compute the distance between them in the dist(...) function (written in C). We do this for a given row y with each row x in turn, using internal apply(...) , and then find the minimum result. Then we do this for each row y in the outer call to apply(...) .

+4
source share

All Articles