I am writing a distance matrix, and in the end I created the following code
In [83]: import numpy as np
In [84]: np.set_printoptions(linewidth=120,precision=2)
In [85]: n = 7 ; a = np.arange(n) ; o = np.ones(n) ; np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))
Out[85]:
array([[ 0. , 1. , 2. , 3. , 4. , 5. , 6. ],
[ 1. , 1.41, 2.24, 3.16, 4.12, 5.1 , 6.08],
[ 2. , 2.24, 2.83, 3.61, 4.47, 5.39, 6.32],
[ 3. , 3.16, 3.61, 4.24, 5. , 5.83, 6.71],
[ 4. , 4.12, 4.47, 5. , 5.66, 6.4 , 7.21],
[ 5. , 5.1 , 5.39, 5.83, 6.4 , 7.07, 7.81],
[ 6. , 6.08, 6.32, 6.71, 7.21, 7.81, 8.49]])
I said to myself: “You are spending an external product, you fool! Save one of them and use transposition!”, Which said that I wrote
In [86]: n = 7 ; a = np.outer(np.arange(n)**2, np.ones(n)) ; np.sqrt(a+a.T)
Out[86]:
array([[ 0. , 1. , 2. , 3. , 4. , 5. , 6. ],
[ 1. , 1.41, 2.24, 3.16, 4.12, 5.1 , 6.08],
[ 2. , 2.24, 2.83, 3.61, 4.47, 5.39, 6.32],
[ 3. , 3.16, 3.61, 4.24, 5. , 5.83, 6.71],
[ 4. , 4.12, 4.47, 5. , 5.66, 6.4 , 7.21],
[ 5. , 5.1 , 5.39, 5.83, 6.4 , 7.07, 7.81],
[ 6. , 6.08, 6.32, 6.71, 7.21, 7.81, 8.49]])
So far, so good, I have had two (slightly) different implementations of the same idea, one of which is obviously faster than the other, isn't it?
In [87]: %timeit n = 1001 ; a = np.arange(n) ; o = np.ones(n) ; np.sqrt(np.outer(o,a*a)+np.outer(a*a,o))
100 loops, best of 3: 13.7 ms per loop
In [88]: %timeit n = 1001 ; a = np.outer(np.arange(n)**2, np.ones(n)) ; np.sqrt(a+a.T)
10 loops, best of 3: 19.7 ms per loop
In [89]:
No! Faster implementation 50% slower!
Question
What amazes me is the behavior I just discovered; am I not surprised to be surprised? In different terms, what is the rationale for the different timings?