np.sum returns a scalar. np.abs returns a new array of the same size. Allocating memory for this new array is what takes the most time. Compare
>>> timeit("np.abs(a)", "import numpy as np; a = np.random.rand(10000000)", number=100) 3.565487278989167 >>> timeit("np.abs(a, out=a)", "import numpy as np; a = np.random.rand(10000000)", number=100) 0.9392949139873963
The out=a argument tells NumPy to get the result in the same array a that overwrites the old data there. Hence the acceleration.
Amount is even a little faster:
>>> timeit("np.sum(a)", "import numpy as np; a = np.random.rand(10000000)", number=100) 0.6874654769926565
but it doesn’t require as much memory access for recording.
If you do not want to overwrite a, providing another array for output abs possible if you are forced to re-take abs arrays of the same type and size.
b = np.empty_like(a)
runs after about half the time np.linalg(a, 1)
For reference, np.linalg computes the L1 norm as
add.reduce(abs(x), axis=axis, keepdims=keepdims)
which includes memory allocation for the new abs(x) array.
Ideally, it would be possible to calculate the sum (or maximum or minimum) of all absolute values (or the results of another "ufunc") without moving the entire output to RAM, and then extract it for the sum / max / min. There was some discussion in NumPy repo , recently in add max_abs ufunc , but it has not reached implementation.
The ufunc.reduce method is available for functions with two inputs such as add or logaddexp , but there is no addabs function ( x, y : x+abs(y) ) to reduce with.