Distance between vectors with missing values

Question

Distance between vectors with missing values

For vectors A and B Euclidean distance is: sqrt((A1-B1)^2+(A2-B2)^2+...+(An-Bn)^2)

 A <- c(5, 4, 3, 2, 1, 1, 2, 3, 5) B <- c(1, 0, 6, 4, 3, 2, 3, 1, 3) dist(rbind(A,B), method= "euclidean") 7.681146

How is distance calculated when vectors A and B contain missing values? Here is an example: R output for distance 8.485281 , but how is it calculated?

 A <- c(5, NA, NA, NA, 1, 1, 2, 3, 5) B <- c(1, 0, 6, NA, NA, NA, NA, 1, 3) dist(rbind(A,B), method= "euclidean") 8.485281

+6

r distance

Filly Apr 19 '14 at 23:38

source share

1 answer

flodel · Accepted Answer · 2014-04-20T00:00:34+0000

Entries with NA first deleted, then the distance is scaled to account for the larger size of the full sample:

 i <- is.na(A) | is.na(B) dist(rbind(A[!i], B[!i])) * sqrt(length(A) / length(A[!i])) # A2 # B2 8.485281

Distance between vectors with missing values

More articles: