Possible optimization for calculating quadratic Euclidean distance

I need to do several hundred million Euclidean distance calculations every day in a Python project.

Here is what I started with:

def euclidean_dist_square(x, y): diff = np.array(x) - np.array(y) return np.dot(diff, diff) 

This is pretty fast, and I already rejected sqrt calculation, since I only need to rank objects (search for nearest neighbors). However, this is still the bottleneck of the script. So I wrote a C extension that calculates distance. Calculation is always performed with 128-dimensional vectors.

 #include "euclidean.h" #include <math.h> double euclidean(double x[128], double y[128]) { double Sum; for(int i=0;i<128;i++) { Sum = Sum + pow((x[i]-y[i]),2.0); } return Sum; } 

The full code for the extension is here: https://gist.github.com/herrbuerger/bd63b73f3c5cf1cd51de

Now it gives good acceleration compared to the numpy version.

But is there a way to speed it further (this is my first C extension when I think it is)? Each time this function is used every day, every microsecond will actually provide an advantage.

Some of you may offer to completely port this from Python to another language, unfortunately, this is a larger project, and not an option :(

Thanks.

Edit

I posted this question on CodeReview: https://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

I will delete this question in an hour if someone starts to write an answer.

+3
source share
1 answer

The fastest way to calculate Euclidean distances in NumPy, which I know, is that in scikit-learn , which can be summed as

 def squared_distances(X, Y): """Return a distance matrix for each pair of rows i, j in X, Y.""" # http://stackoverflow.com/a/19094808/166749 X_row_norms = np.einsum('ij,ij->i', X, X) Y_row_norms = np.einsum('ij,ij->i', Y, Y) distances = np.dot(X, Y) distances *= -2 distances += X_row_norms distances += Y_row_norms np.maximum(distances, 0, distances) # get rid of negatives; optional return distances 

The bottleneck in this code snippet is matrix multiplication ( np.dot ), so make sure your NumPy is associated with a good BLAS implementation; with multi-threaded BLAS on a multi-core computer and sufficiently large input matrices, it should be faster than anything you can hack in C. Note that it relies on a binomial formula

 ||x - y||Β² = ||x||Β² + ||y||Β² - 2 xβ‹…y 

and that either X_row_norms or Y_row_norms can be cached through calls for the case of using k-NN.

(I am a co-author of this code, and I spent a lot of time optimizing both it and the SciPy implementation; scikit-learn is faster due to some accuracy, but for k-NN, which should not matter either SciPy implementation available in scipy.spatial.distance is actually an optimized version of the code you just wrote and is more accurate.)

+11
source

All Articles