Effective double amount of products

Consider two ndarrays length n , arr1 and arr2 . I calculate the following sum of products and do it num_runs times for comparison:

 import numpy as np import time num_runs = 1000 n = 100 arr1 = np.random.rand(n) arr2 = np.random.rand(n) start_comp = time.clock() for r in xrange(num_runs): sum_prods = np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] ) print "total time for comprehension = ", time.clock() - start_comp start_loop = time.clock() for r in xrange(num_runs): sum_prod = 0.0 for i in xrange(n): for j in xrange(i+1, n): sum_prod += arr1[i]*arr2[j] print "total time for loop = ", time.clock() - start_loop 

Output signal

 total time for comprehension = 3.23097066953 total time for comprehension = 3.9045544426 

so that understanding the list appears faster.

Is there a more efficient implementation, perhaps using Numpy routines to calculate this amount of products?

+7
python numpy
source share
3 answers

Rearrange the operation in the O (n) runtime algorithm instead of O (n ^ 2) and use NumPy for products and sums:

 # arr1_weights[i] is the sum of all terms arr1[i] gets multiplied by in the # original version arr1_weights = arr2[::-1].cumsum()[::-1] - arr2 sum_prods = arr1.dot(arr1_weights) 

Timing shows that this is about 200 times faster than list comprehension for n == 100 .

 In [21]: %%timeit ....: np.sum([arr1[i] * arr2[j] for i in range(n) for j in range(i+1, n)]) ....: 100 loops, best of 3: 5.13 ms per loop In [22]: %%timeit ....: arr1_weights = arr2[::-1].cumsum()[::-1] - arr2 ....: sum_prods = arr1.dot(arr1_weights) ....: 10000 loops, best of 3: 22.8 µs per loop 
+12
source share

Vector: np.sum(np.triu(np.multiply.outer(arr1,arr2),1)) .

to improve 30 times:

 In [9]: %timeit np.sum(np.triu(np.multiply.outer(arr1,arr2),1)) 1000 loops, best of 3: 272 µs per loop In [10]: %timeit np.sum( [arr1[i]*arr2[j] for i in range(n) for j in range(i+1, n)] 100 loops, best of 3: 7.9 ms per loop In [11]: allclose(np.sum(np.triu(np.multiply.outer(arr1,arr2),1)), np.sum(np.triu(np.multiply.outer(arr1,arr2),1))) Out[11]: True 

Another quick way is to use numba:

 from numba import jit @jit def t(arr1,arr2): s=0 for i in range(n): for j in range(i+1,n): s+= arr1[i]*arr2[j] return s 

for a 10x new factor:

 In [12]: %timeit t(arr1,arr2) 10000 loops, best of 3: 21.1 µs per loop 

And using the minimum answer @ user2357112 answer,

 @jit def t2357112(arr1,arr2): s=0 c=0 for i in range(n-2,-1,-1): c += arr2[i+1] s += arr1[i]*c return s 

for

 In [13]: %timeit t2357112(arr1,arr2) 100000 loops, best of 3: 2.33 µs per loop 

just by performing the necessary operations.

+8
source share

You can use the following broadcast trick:

 a = np.sum(np.triu(arr1[:,None]*arr2[None,:],1)) b = np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] ) print a == b # True 

Basically, I pay the price of computing the product of all elements in pairs in arr1 and arr2 , in order to use the numpy baud rate / vectorization, which is much faster in low-level code.

And timings:

 %timeit np.sum(np.triu(arr1[:,None]*arr2[None,:],1)) 10000 loops, best of 3: 55.9 µs per loop %timeit np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] ) 1000 loops, best of 3: 1.45 ms per loop 
+3
source share

All Articles