You can use numpy.einsum :
np.einsum('ji,jk,ki->i',x,a,x)
This will give the same result. Let's see if it will be much faster:

It seems that dot is still faster, especially because it uses streaming BLAS, unlike einsum , which runs on a single core.
import perfplot import numpy as np def setup(n): k = n x = np.random.random((k, n)) A = np.random.random((k, k)) return x, A def loop(data): x, A = data n = x.shape[1] out = np.empty(n) for i in range(n): out[i] = x[:, i].T.dot(A).dot(x[:, i]) return out def einsum(data): x, A = data return np.einsum('ji,jk,ki->i', x, A, x) def dot(data): x, A = data return (xTdot(A)*xT).sum(axis=1) perfplot.show( setup=setup, kernels=[loop, einsum, dot], n_range=[2**k for k in range(10)], logx=True, logy=True, xlabel='n, k' )
source share