There's a potentially quick way (for very large arrays) using only NumPy, which is used in scikit-learn:
def squared_row_norms(X):
This is based on the fact that (x - y) ² = x² + y² - 2xy, even for vectors.
Test:
>>> data = np.random.randn(10, 40) >>> vec = np.random.randn(1, 40) >>> ((data - vec) ** 2).sum(axis=1) array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 84.8832107 , 82.28910021, 67.48309433, 81.94813371, 64.68162331, 77.43265692]) >>> squared_euclidean_distances(data, vec) array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 84.8832107 , 82.28910021, 67.48309433, 81.94813371, 64.68162331, 77.43265692]) >>> from sklearn.metrics.pairwise import euclidean_distances >>> euclidean_distances(data, vec, squared=True).ravel() array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 84.8832107 , 82.28910021, 67.48309433, 81.94813371, 64.68162331, 77.43265692])
Profile:
>>> data = np.random.randn(1000, 40) >>> vec = np.random.randn(1, 40) >>> %timeit ((data - vec)**2).sum(axis=1) 10000 loops, best of 3: 114 us per loop >>> %timeit squared_euclidean_distances(data, vec) 10000 loops, best of 3: 52.5 us per loop
Using numexpr is also possible, but it does not seem to give acceleration for 1000 points (and by 10000 it is not much better):
>>> %timeit ne.evaluate("sum((data - vec) ** 2, axis=1)") 10000 loops, best of 3: 142 us per loop