OK. I recently discovered that the scipy.spatial.distance.cdist command scipy.spatial.distance.cdist very fast for solving the COMPLETE distance matrix between two vector arrays for source and destination. see How can I calculate the Euclidean distance using numpy? I wanted to try to duplicate these performance metrics when deciding the distance between two arrays of equal size. The distance between two SINGLE vectors is pretty straight forward to calculate, as shown in the previous link. We can take vectors:
import numpy as np A=np.random.normal(size=(3)) B=np.random.normal(size=(3))
and then use 'numpy.linalg.norm' where
np.linalg.norm(AB)
equivalently
temp = AB np.sqrt(temp[0]**2+temp[1]**2+temp[2]**2)
which works great when I want to know the distance between two sets of vectors, where my_distance = distance_between( A[i], B[i] ) for all i second solution works fine. In this, as expected:
A=np.random.normal(size=(3,42)) B=np.random.normal(size=(3,42)) temp = AB np.sqrt(temp[0]**2+temp[1]**2+temp[2]**2)
gives me a set of 42 distances between the ith element of A to the ith element of B While the norm function correctly calculates the norm for the whole matrix, giving me one value, which is not what I'm looking for. The 42-distance behavior is what I want to maintain, hopefully at almost the same speed as I get from cdist to solve complete matrices. So the question is, what is the most efficient way to use python and numpy / scipy to calculate the distances between data with the form (n,i) <
Thanks Sloan
source share