There is a trace function that gives the sum of the diagonal. You can specify the offset and 2 axes (0 and 1 are the default values). And to get anti-diagonal, you just need to flip one dimension. np.flipud does this, although it simply indexes [::-1,...] .
Putting them together
np.array([np.trace(np.flipud(array),offset=k) for k in range(-1,3)])
matches your new_array .
It still goes over the possible values of l (in this case 4). trace itself compiled.
In this small case, it is actually slower than your double loop (2x3 steps). Even if I move flipud from the inner loop, it is still slower. I do not know how this scales for large arrays.
Part of the problem with vectorizing this even further is the fact that each diagonal has a different length.
In [331]: %%timeit array1 = array[::-1] np.array([np.trace(array1,offset=k) for k in range(-1,3)]) .....: 10000 loops, best of 3: 87.4 µs per loop In [332]: %%timeit new_array = np.zeros([array.shape[0] + array.shape[1] - 1] + list(array.shape[2:])) for i in range(2): for j in range(3): new_array[i+j] += array[i,j] .....: 10000 loops, best of 3: 43.5 µs per loop
scipy.sparse has a dia format in which values of nonzero diagonals are stored. It saves a populated array of values along with offsets.
array([[12, 0, 0, 0], [ 8, 13, 0, 0], [ 4, 9, 14, 0], [ 0, 5, 10, 15], [ 0, 1, 6, 11], [ 0, 0, 2, 7], [ 0, 0, 0, 3]]) array([-3, -2, -1, 0, 1, 2, 3])
Although this is a way around the problem of variable diagonal lengths, I don’t think it helps in this case when you just need their sums.