Why is there such a big speed difference between the following L2 calculations:
a = np.arange(1200.0).reshape((-1,3)) %timeit [np.sqrt((a*a).sum(axis=1))] 100000 loops, best of 3: 12 µs per loop %timeit [np.sqrt(np.dot(x,x)) for x in a] 1000 loops, best of 3: 814 µs per loop %timeit [np.linalg.norm(x) for x in a] 100 loops, best of 3: 2 ms per loop
All three give the same results, as far as I can see.
Here is the source code for the numpy.linalg.norm function:
x = asarray(x) # Check the default case first and handle it immediately. if ord is None and axis is None: x = x.ravel(order='K') if isComplexType(x.dtype.type): sqnorm = dot(x.real, x.real) + dot(x.imag, x.imag) else: sqnorm = dot(x, x) return sqrt(sqnorm)
EDIT: Someone suggested that one version could be parallelized, but I checked, and it is not. All three versions consume 12.5% of the processor (as is usually the case with Python code on my 4 physical / 8-core Xeon Core Core processor.
source share