Optimization Stage No. 1
My first level of optimizations using direct translation of the loop code to broadcasting based on one when introducing a new axis and, as such, is not as memory efficient as indicated below -
p1 = (-((X[:,None] - Y)**2)/c1)-c2 p11 = p1.sum(2) p2 = np.exp(p11+c3) out = np.log(p2.sum(0)).mean()
Optimization Stage No. 2
Having given several optimizations, bearing in mind that we intend to single out operations on constants, I got the following:
c10 = -c1 c20 = X.shape[1]*c2 subs = (X[:,None] - Y)**2 p00 = subs.sum(2) p10 = p00/c10 p11 = p10-c20 p2 = np.exp(p11+c3) out = np.log(p2.sum(0)).mean()
Optimization Stage No. 3
Going further, and seeing places where I can optimize operations, I ended up using Scipy cdist to replace heavy squared and sum-reduction . This should be pretty memory efficient and give us the final implementation, as shown below -
from scipy.spatial.distance import cdist
Runtime test
Approaches -
def loopy_app(X, Y, sigma): k, d = X.shape m = Y.shape[0] c1 = 2.0*sigma**2 c2 = 0.5*np.log(np.pi*c1) c3 = np.log(1.0/k) L_B = np.zeros((m,)) for i in xrange(m): L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide( np.power(XY[i,:],2), c1)-c2,1)+c3))) return np.mean(L_B) def vectorized_app(X, Y, sigma):
Timing and Verification -
In [294]:
Around 10x speed there!
Divakar
source share