Find the closest vector from the list of vectors | python

If you are given a list of 10 vectors called A that represent different groups. Then you have a time series of vectors v1, v2, ..., vn, each of which is also a vector. I was wondering if there is a way to find the "closest" vector in for each v1, v2, ..., vn if you define some distance metric?

Is there a quick way to do this, besides scrolling and simply comparing all the records?

Edit: None. I do not ask how to do k-means or something like that.

+8
python vector distance
source share
3 answers

You can use spatial KDtree in scipy . It uses a fast tree algorithm to determine close points for vectors of arbitrary dimension.

Edit: sorry if you're looking for arbitrary distance metrics , a tree-like structure may still be an option.

Here is an example:

>>> from scipy import spatial >>> A = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]] >>> tree = spatial.KDTree(A) 

This installs KDTree with all points in A, which allows you to perform quick spatial searches in it. Such a request takes a vector and returns the nearest neighbor in A:

 >>> tree.query([0.5,0.5,0.5,0.5,0.5]) (1.1180339887498949, 3) 

The first return value is the distance of the nearest neighbor, and the second is its position in A, so you can get it, for example, as follows:

 >>> A[ tree.query([0.5,0.5,0.5,0.5,0.5])[1] ] [1, 0, 1, 0, 1] 
+12
source share

If you define a metric, you can use it in the min function:

 closest = min(A, key=distance) 
+1
source share

So, some sample code:

 # build a KD-tree to compare to some array of vectors 'centall' tree = scipy.spatial.KDTree(centall) print 'shape of tree is ', tree.data.shape # loop through different regions and identify any clusters that belong to a different region [d1, i1] = tree.query(group1) [d2, i2] = tree.query(group2) 

This returns the variables d and i. d stores the closest distance i returns the index at which this occurs

Hope this helps.

+1
source share

All Articles