I am trying to understand what is happening with my calculation of the Canberra distance. I am writing my own simple canberra.distance function, however the results are not consistent with the dist function. I added the na.rm = T function to my function to be able to calculate the sum with a zero denominator. From ?dist I understand that they use a similar approach: Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
canberra.distance <- function(a, b){ sum( (abs(a - b)) / (abs(a) + abs(b)), na.rm = T ) } a <- c(0, 1, 0, 0, 1) b <- c(1, 0, 1, 0, 1) canberra.distance(a, b) > 3
Pairs 0-0 and 1-1 seem problematic. In the first case (0-0), both the numerator and the denominator are equal to zero, and this pair should be omitted. In the second case (1-1), the numerator is 0, and the denominator is not, and then it is also 0, and the sum should not change.
What am I missing here?
EDIT: To meet the definition of R, the canberra.distance function can be modified as follows:
canberra.distance <- function(a, b){ sum( abs(a - b) / abs(a + b), na.rm = T ) }
However, the results are the same as before.
Adela source share