I need to calculate the trace of the matrix across all its diagonals. That is, for an nxm matrix, the operation should produce n + m-1 'traces'. Here is an example program:
import numpy as np A=np.arange(12).reshape(3,4) def function_1(A): output=np.zeros(A.shape[0]+A.shape[1]-1) for i in range(A.shape[0]+A.shape[1]-1): output[i]=np.trace(A,A.shape[1]-1-i) return output A array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) function_1(A) array([ 3., 9., 18., 15., 13., 8.])
My hope is to find a way to replace the loop in the program, since I need to do this calculation on very large matrices many times. One promising prospect is to use numpy.einsum, but I cannot figure out how to do this. As an alternative, I considered rewriting the problem completely using loops in the ziton:
%load_ext cythonmagic %%cython import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) def function_2(long [:,:] A): cdef int n=A.shape[0] cdef int m=A.shape[1] cdef long [::1] output = np.empty(n+m-1,dtype=np.int64) cdef size_t l1 cdef int i,j, k1 cdef long out it_list1=range(m) it_list2=range(m,m+n-1) for l1 in range(len(it_list1)): k1=it_list1[l1] i=0 j=m-1-k1 out=0 while (i<n)&(j<m): out+=A[i,j] i+=1 j+=1 output[k1]=out for l1 in range(len(it_list2)): k1=it_list2[l1] i=k1-m+1 j=0 out=0 while (i<n)&(j<m): out+=A[i,j] i+=1 j+=1 output[k1]=out return np.array(output)
The cython program is superior to the program passing through np.trace:
%timeit function_1(A) 10000 loops, best of 3: 62.7 µs per loop %timeit function_2(A) 100000 loops, best of 3: 9.66 µs per loop
So basically, I want to get feedback on whether there was a more efficient way to use numpy / scipy procedures, or if I probably reached a quick way to use cython.