I play with the following code from collective intelligence programming, this is a function from a book that calculates the distance between clicks between two film critics.
This function sums up the ranking difference in the dictionary, but the Euclidean distance in n dimensions also includes the square root of this sum.
AFAIK, since we use the same function to rank all, it does not matter if we are the square of the root or not, but I wondered if there is a special reason for this?
from math import sqrt # Returns a distance-based similarity score for person1 and person2 def sim_distance(prefs,person1,person2): # Get the list of shared_items si={} for item in prefs[person1]: if item in prefs[person2]: si[item]=1 # if they have no ratings in common, return 0 if len(si)==0: return 0 # Add up the squares of all the differences sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]]) return 1/(1+sum_of_squares)
python euclidean distance
Hamza yerlikaya
source share