Is this numpy.outer () faster than transposition?

I am writing a distance matrix, and in the end I created the following code

In [83]: import numpy as np

In [84]: np.set_printoptions(linewidth=120,precision=2)

In [85]: n = 7 ; a = np.arange(n) ; o = np.ones(n) ; np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))
Out[85]: 
array([[ 0.  ,  1.  ,  2.  ,  3.  ,  4.  ,  5.  ,  6.  ],
       [ 1.  ,  1.41,  2.24,  3.16,  4.12,  5.1 ,  6.08],
       [ 2.  ,  2.24,  2.83,  3.61,  4.47,  5.39,  6.32],
       [ 3.  ,  3.16,  3.61,  4.24,  5.  ,  5.83,  6.71],
       [ 4.  ,  4.12,  4.47,  5.  ,  5.66,  6.4 ,  7.21],
       [ 5.  ,  5.1 ,  5.39,  5.83,  6.4 ,  7.07,  7.81],
       [ 6.  ,  6.08,  6.32,  6.71,  7.21,  7.81,  8.49]])

I said to myself: “You are spending an external product, you fool! Save one of them and use transposition!”, Which said that I wrote

In [86]: n = 7 ; a = np.outer(np.arange(n)**2, np.ones(n)) ; np.sqrt(a+a.T)
Out[86]: 
array([[ 0.  ,  1.  ,  2.  ,  3.  ,  4.  ,  5.  ,  6.  ],
       [ 1.  ,  1.41,  2.24,  3.16,  4.12,  5.1 ,  6.08],
       [ 2.  ,  2.24,  2.83,  3.61,  4.47,  5.39,  6.32],
       [ 3.  ,  3.16,  3.61,  4.24,  5.  ,  5.83,  6.71],
       [ 4.  ,  4.12,  4.47,  5.  ,  5.66,  6.4 ,  7.21],
       [ 5.  ,  5.1 ,  5.39,  5.83,  6.4 ,  7.07,  7.81],
       [ 6.  ,  6.08,  6.32,  6.71,  7.21,  7.81,  8.49]])

So far, so good, I have had two (slightly) different implementations of the same idea, one of which is obviously faster than the other, isn't it?

In [87]: %timeit n = 1001 ; a = np.arange(n) ; o = np.ones(n) ; np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))
100 loops, best of 3: 13.7 ms per loop

In [88]: %timeit n = 1001 ; a = np.outer(np.arange(n)**2, np.ones(n)) ; np.sqrt(a+a.T)
10 loops, best of 3: 19.7 ms per loop

In [89]: 

No! Faster implementation 50% slower!

Question

What amazes me is the behavior I just discovered; am I not surprised to be surprised? In different terms, what is the rationale for the different timings?

+4
source share
3 answers

Below are some timings with a little n=7:

In [784]: timeit np.outer(o,a*a)
10000 loops, best of 3: 24.2 µs per loop

In [785]: timeit np.outer(a*a,o)
10000 loops, best of 3: 25.7 µs per loop

In [786]: timeit np.outer(a*a,o)+np.outer(o,a*a)
10000 loops, best of 3: 52.7 µs per loop

2 , , .

In [787]: timeit a2=np.outer(a*a,o); a2+a2.T
10000 loops, best of 3: 33.2 µs per loop

In [788]: timeit a2=np.outer(a*a,o); a2+a2
10000 loops, best of 3: 27.9 µs per loop

In [795]: timeit a2=np.outer(a*a,o); a2.T+a2.T
10000 loops, best of 3: 29.4 µs per loop

2, , a2.T a2 , a2 a2.T . , . . .

, outer , , .

n 2 (n,n) , . , outer.


outer a*a.T .

+1

a o, :

import timeit
import numpy as np
n = 1001
a = np.arange(n)
o = np.ones(n)
def g(a, o):
    z = np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))

def f(a, o):
    a = np.outer(a**2, o)
    y = np.sqrt(a+a.T)

assert np.all(f(a, o) == g(a, o))

t  = Timer('g(a, o)', 'from __main__ import a, o, np, f, g')
print 'g:', t.timeit(100)/100    # g: 0.0166591598767
t  = Timer('f(a, o)', 'from __main__ import a, o, np, f, g')
print 'f:', t.timeit(100)/100    # f: 0.0200494056252
+1

It’s funny that when I run my example, I get inexperienced results:

In [7]: %timeit n = 1001 ; a = np.arange(n) ; o = np.ones(n) ; np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))
100 loops, best of 3: 17.2 ms per loop

In [8]: %timeit n = 1001 ; a = np.outer(np.arange(n)**2, np.ones(n)) ; np.sqrt(a+a.T)
100 loops, best of 3: 12.8 ms per loop

But this is the fastest and easiest way I could think of:

In [139]: %timeit n = 1001 ; a = np.arange(n); np.sqrt((a**2)[:, np.newaxis]+a**2)
100 loops, best of 3: 10.8 ms per loop

Aside, if you work with distances, you can find a useful module scipy.spatial.distanceand function scipy.spatial.distance_matrix.

+1
source

All Articles