Numpy.sum may be slower than Python for-loop

When summing an array along a specific axis, the method of the allocated array array.sum(ax) can actually be slower than the for loop:

 v = np.random.rand(3,1e4) timeit v.sum(0) # vectorized method 1000 loops, best of 3: 183 us per loop timeit for row in v[1:]: v[0] += row # python loop 10000 loops, best of 3: 39.3 us per loop 

The vector method is more than 4 times slower than usual for the loop! What happens (wr) on (g) here, can I not trust vectorized methods in numpy faster than for-loops?

+7
source share
1 answer

No, you can’t. As your interesting example shows, numpy.sum can be suboptimal, and a more efficient layout of operations through explicit for loops can be more efficient.

Let me show you another example:

 >>> N, M = 10**4, 10**4 >>> v = np.random.randn(N,M) >>> r = np.empty(M) >>> timeit.timeit('v.sum(axis=0, out=r)', 'from __main__ import v,r', number=1) 1.2837879657745361 >>> r = np.empty(N) >>> timeit.timeit('v.sum(axis=1, out=r)', 'from __main__ import v,r', number=1) 0.09213519096374512 

Here you clearly see that numpy.sum is optimal if the summation on the quick start index ( v is C-adjacent) and suboptimal when summing on a slow working axis. Interestingly, the opposite is true for for loops:

 >>> r = np.zeros(M) >>> timeit.timeit('for row in v[:]: r += row', 'from __main__ import v,r', number=1) 0.11945700645446777 >>> r = np.zeros(N) >>> timeit.timeit('for row in vT[:]: r += row', 'from __main__ import v,r', number=1) 1.2647287845611572 

I did not have time to check the numpy code, but I suspect that the difference between the two is continuous memory access or string access.

As these examples show, when implementing a numerical algorithm, the correct memory location is of great importance. Vectorized code does not necessarily solve every problem.

+8
source

All Articles