How to efficiently sum numpy arrays after multiplying them?

Actually I need to calculate:

S_i = sum(U_j * U_j.transpose) * K_i

Where

U_j is a n * k dim matrix, 
K_i is a n * n dim matrix, 
j != i, 
i = 1, 2, ..., n

And I used these loops:

import numpy as np
for i in xrange(n):
    temp = np.zeros((n, n))
    for j in xrange (n):
        if j != i:
            temp += np.dot(U[j], U[j].T)
    S[i] = np.dot(temp, K[i])

Is there a more efficient method for this?

+4
source share
1 answer
import numpy as np

n, k = 30, 40

U = np.random.random((n, n, k))
K = np.random.random((n, n, n))

def using_loops(U, K):
    S = np.empty((n, n, n))
    for i in xrange(n):
        temp = np.zeros((n, n))
        for j in xrange (n):
            if j != i:
                temp += np.dot(U[j], U[j].T)
        S[i] = np.dot(temp, K[i])
    return S

def using_einsum(U, K):
    uut = np.einsum('ijk,ilk->ijl', U, U)
    total = uut.sum(axis=0)
    total = total - uut
    S = np.einsum('ijk,ikl->ijl', total, K)
    return S

This verifies that using_loopsthey using_einsumgive the same result.

In [260]: np.allclose(using_loops(U, K), using_einsum(U, K))
Out[260]: True

This shows that using_einsum is faster; how much the size depends on nand k:

In [262]: %timeit using_loops(U, K)
100 loops, best of 3: 17.1 ms per loop

In [263]: %timeit using_einsum(U, K)
1000 loops, best of 3: 1.92 ms per loop

In general, whenever you see the sums of products, there is a good chance that np.einsum will be a fairly quick way to get results. It will almost certainly beat Python for-loops.

+3
source

All Articles