Calculate matrix trace across all diagonals

I need to calculate the trace of the matrix across all its diagonals. That is, for an nxm matrix, the operation should produce n + m-1 'traces'. Here is an example program:

import numpy as np A=np.arange(12).reshape(3,4) def function_1(A): output=np.zeros(A.shape[0]+A.shape[1]-1) for i in range(A.shape[0]+A.shape[1]-1): output[i]=np.trace(A,A.shape[1]-1-i) return output A array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) function_1(A) array([ 3., 9., 18., 15., 13., 8.]) 

My hope is to find a way to replace the loop in the program, since I need to do this calculation on very large matrices many times. One promising prospect is to use numpy.einsum, but I cannot figure out how to do this. As an alternative, I considered rewriting the problem completely using loops in the ziton:

 %load_ext cythonmagic %%cython import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) def function_2(long [:,:] A): cdef int n=A.shape[0] cdef int m=A.shape[1] cdef long [::1] output = np.empty(n+m-1,dtype=np.int64) cdef size_t l1 cdef int i,j, k1 cdef long out it_list1=range(m) it_list2=range(m,m+n-1) for l1 in range(len(it_list1)): k1=it_list1[l1] i=0 j=m-1-k1 out=0 while (i<n)&(j<m): out+=A[i,j] i+=1 j+=1 output[k1]=out for l1 in range(len(it_list2)): k1=it_list2[l1] i=k1-m+1 j=0 out=0 while (i<n)&(j<m): out+=A[i,j] i+=1 j+=1 output[k1]=out return np.array(output) 

The cython program is superior to the program passing through np.trace:

 %timeit function_1(A) 10000 loops, best of 3: 62.7 µs per loop %timeit function_2(A) 100000 loops, best of 3: 9.66 µs per loop 

So basically, I want to get feedback on whether there was a more efficient way to use numpy / scipy procedures, or if I probably reached a quick way to use cython.

+7
python numpy cython
source share
7 answers

Here's an improved version of your Cython function. Honestly, I would do that if Cython were an option.

 import numpy as np from libc.stdint cimport int64_t as i64 from cython cimport boundscheck, wraparound @boundscheck(False) @wraparound(False) def all_trace_int64(i64[:,::1] A): cdef: int i,j i64[:] t = np.zeros(A.shape[0] + A.shape[1] - 1, dtype=np.int64) for i in range(A.shape[0]): for j in range(A.shape[1]): t[A.shape[0]-i+j-1] += A[i,j] return np.array(t) 

This will be significantly faster than the version you give in your question, because it iterates over the array in the order in which it is stored in memory. For small arrays, both approaches are almost the same, although on my machine it is a little faster.

I wrote this function so that it requires a C-contiguous array. If you have an adjacent Fortran array, rearrange it, and then rearrange the output.

This returns the answers in the reverse order of the function shown in your example, so you will need to reorder the array if the order is especially important.

You can also improve performance by compiling with heavier optimizations. For example, you can create your Cython code in an IPython laptop with additional compiler flags, replacing

 %%cython 

with something like

 %%cython -c=-O3 -c=-march=native -c=-funroll-loops -f 

Edit: In doing so, you will also want to make sure that your values ​​are not generated by an external product. If your values ​​come from an external product, this operation can be combined with the external product in a single np.convolve call.

+2
source share

If you want to stay away from Cython, building a diagonal index matrix and using np.bincount can do the trick:

 >>> import numpy as np >>> a = np.arange(12).reshape(3, 4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> rows, cols = a.shape >>> rows_arr = np.arange(rows) >>> cols_arr = np.arange(cols) >>> diag_idx = rows_arr[:, None] - (cols_arr - (cols - 1)) >>> diag_idx array([[3, 2, 1, 0], [4, 3, 2, 1], [5, 4, 3, 2]]) >>> np.bincount(diag_idx.ravel(), weights=a.ravel()) array([ 3., 9., 18., 15., 13., 8.]) 

According to my timings, for entering your example, it is 4 times faster than your original pure Python method. So I don’t think it will be faster than your Cython code, but you may want its time.

+5
source share

If your matrix form is far enough from the square, i.e. if it is tall or wide, then you can use step tricks for this. You can use step tricks in any case, but it may not be super-efficient memory if the matrix is ​​close to square.

What you need to do is create a new kind of array for the same data, which was designed in such a way that a step going from one row to another also causes an increase in the column. This is achieved by changing the steps of the array.

The problem to take care of is at the borders of the array where a null pad is required. If the array is far from square, it does not matter. If it is a square, we need twice the size of the array for laying.

If you don't need smaller traces around the edges, you don't need to nullify.

Here (assuming more columns than rows, but easy to adapt):

 import numpy as np from numpy.lib.stride_tricks import as_strided A = np.arange(30).reshape(3, 10) A_embedded = np.hstack([np.zeros([3, 2]), A, np.zeros([3, 2])]) A = A_embedded[:, 2:-2] # We are now sure that the memory around A is padded with 0, but actually we never really need A again new_strides = (A.strides[0] + A.strides[1], A.strides[1]) B = as_strided(A_embedded, shape=A_embedded[:, :-2].shape, strides=new_strides) traces = B.sum(0) print A print B print traces 

To match the output that you show in your example, you need to undo it (see @larsmans comment)

 traces = traces[::-1] 

This is a concrete example with specific numbers. If this is useful for your use, I can turn it into a general function.

+3
source share

This is competitive if the array is large:

 def f5(A): rows, cols = A.shape N = rows + cols -1 out = np.zeros(N, A.dtype) for idx in range(rows): out[N-idx-cols:N-idx] += A[idx] return out[::-1] 

Although it uses a Python loop, it is faster than a bincount solution (for large arrays .. on my system ..)


This method is highly sensitive to the column / row ratio of the array, since this ratio determines how many loops are executed in Python relative to Numpy. Since @Jaime pointed out that it is efficient for iterating the smallest dimension, like this:

 def f6(A): rows, cols = A.shape N = rows + cols -1 out = np.zeros(N, A.dtype) if rows > cols: for idx in range(cols): out[N-idx-rows:N-idx] += A[:, idx] else: for idx in range(rows): out[N-idx-cols:N-idx] += A[idx] out = out[::-1] return out 

But it should be noted that for large sizes of the array (for example, 100000 x 500 on my system), accessing the array line by line, as in the first code I published, can be even faster, probably due to the way the array is laid out in a ram (faster to copy adjacent pieces than scattered bits).

+2
source share

This can be done (slightly offensively) using scipy.sparse.dia_matrix two ways: one more rare than the other.

The first, which gives an accurate result, uses the dia_matrix stored data vector

 import numpy as np from scipy.sparse import dia_matrix A = np.arange(30).reshape(3, 10) traces = dia_matrix(A).data.sum(1)[::-1] 

A less intense memory will work in the opposite direction:

 import numpy as np from scipy.sparse import dia_matrix A = np.arange(30).reshape(3, 10) A_dia = dia_matrix((A, range(len(A))), shape=(A.shape[1],) * 2) traces = np.array(A_dia.sum(1)).ravel()[::-1] 

Please note that there are no two entries in this solution. This can be fixed in a reasonable way, but I'm not sure yet.


@moarningsun found a solution:

 rows, cols = A.shape A_dia = dia_matrix((A, np.arange(rows)), shape=(cols,)*2) traces1 = A_dia.sum(1).A.ravel() A_dia = dia_matrix((A, np.arange(-rows+1, 1)), shape=(rows,)*2) traces2 = A_dia.sum(1).A.ravel() traces = np.concatenate((traces1[::-1], traces2[-2::-1])) 
+1
source share

np.trace does what you want:

 import numpy as np A = array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) n = A.shape[0] [np.trace(A, i) for i in range(-n+1, n+1)] 

Edit : changed np.sum(np.diag()) to np.trace() as suggested by @ user2357112.

-one
source share

Use the numpy array trace method:

 import numpy as np A = np.array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) A.trace() 

returns:

 15 
-2
source share

All Articles