Here is a Cython implementation that gives a more than 3x speed boost for this example on my computer. This time should be revised for large arrays because BLAS procedures can probably scale much better than this rather naive code.
I know that you asked for something inside scipy / numpy / scikit-learn, but maybe this will open up new possibilities for you:
my_cython.pyx file:
import numpy as np cimport numpy as np import cython cdef extern from "math.h": double abs(double t) @cython.wraparound(False) @cython.boundscheck(False) def pairwise_distance(np.ndarray[np.double_t, ndim=1] r): cdef int i, j, c, size cdef np.ndarray[np.double_t, ndim=1] ans size = sum(range(1, r.shape[0]+1)) ans = np.empty(size, dtype=r.dtype) c = -1 for i in range(r.shape[0]): for j in range(i, r.shape[0]): c += 1 ans[c] = abs(r[i] - r[j]) return ans
The answer is a one-dimensional array containing all non-repeating estimates.
To import into Python:
import numpy as np import random import pyximport; pyximport.install() from my_cython import pairwise_distance r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)], dtype=float) def solOP(r): return np.abs(r - r[:, None])
Timing with IPython:
In [2]: timeit solOP(r) 100 loops, best of 3: 7.38 ms per loop In [3]: timeit pairwise_distance(r) 1000 loops, best of 3: 1.77 ms per loop
Saullo castro
source share