NumPy matrix multiplication efficiency for a matrix with a known structure

I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:

import numpy as np C = np.dot(A, B) 

However, I know that for matrix B only row n and column n are nonzero (this comes directly from the analytical formula that created the matrix and, no doubt, always holds).

Hoping to take advantage of this fact and reduce the number of multiplications needed to create C, I replaced the above:

 import numpy as np for row in range(0, N): for col in range(0, N): if col != n: C[row, col] = A[row, n]*B[n, col] #Just one scalar multiplication else: C[row, col] = np.dot(A[row], B[:, n]) 

Analytically, this should reduce the overall complexity as follows: in the general case (without using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O (N ^ 3), that is, all N rows must multiply all N columns, and each of these point products contains N multiplications => O (NNN) = O (N ^ 3). #

Using structure B, as I did above, should go like O (N ^ 2 + N ^ 2) = O (2N ^ 2) = O (N ^ 2). That is, all N rows must multiply all N columns, but all of them (except those related to "B [:, n]") require only one scalar multiplication: only one element from "B [:, m] "is nonzero for m! = n. When n == m, which will occur N times (once for each row A that should multiply a column n of B), N scalar multiplications must be performed. #

However, the first block of code (using np.dot (A, B)) is much faster. I know (with the help of information such as: Why is matrix multiplication faster with numpy than with ctypes in Python? ) That the low-level implementation details of np.dot are to blame. So my question is this: how can I use the structure of matrix B to increase the efficiency of multiplication without sacrificing the efficiency of the NumPy implementation , without creating my own multiplication of the low level matrix in c ?

This method is part of numerical optimization in many variables, so O (N ^ 3) is unsolvable, while O (N ^ 2) is likely to do its job.

Thanks for any help. Also, I'm new to SO, so please apologize for any newbie errors.

+6
source share
2 answers

If I understood A and B correctly, then I don’t understand the for loops and why you don’t just multiply by two nonzero vectors:

 # say A & B are like this: n, N = 3, 5 A = np.array( np.random.randn(N, N ) ) B = np.zeros_like( A ) B[ n ] = np.random.randn( N ) B[:, n] = np.random.randn( N ) 

take a nonzero row and column B:

 rowb, colb = B[n,:], np.copy( B[:,n] ) colb[ n ] = 0 

multiply A by these two vectors:

 X = np.outer( A[:,n], rowb ) X[:,n] += np.dot( A, colb ) 

To check the check:

 X - np.dot( A, B ) 

with N=100 :

 %timeit np.dot(A, B) 1000 loops, best of 3: 1.39 ms per loop %timeit colb = np.copy( B[:,n] ); colb[ n ] = 0; X = np.outer( A[:,n], B[n,:] ); X[:,n] += np.dot( A, colb ) 10000 loops, best of 3: 98.5 µs per loop 
+6
source

I timed it, and using sparse is faster:

 import numpy as np from scipy import sparse from timeit import timeit A = np.random.rand(100,100) B = np.zeros(A.shape, dtype=np.float) B[3] = np.random.rand(100) B[:,3] = np.random.rand(100) sparse_B = sparse.csr_matrix(B) n = 1000 t1 = timeit('np.dot(A, B)', 'from __main__ import np, A, B', number=n) print 'dense way : {}'.format(t1) t2 = timeit('A * sparse_B', 'from __main__ import A, sparse_B',number=n) print 'sparse way : {}'.format(t2) 

Result:

 dense way : 1.15117192268 sparse way : 0.113152980804 >>> np.allclose(np.dot(A, B), A * sparse_B) True 

As the number of rows of B increases, so does the advantage of multiplication time using a sparse matrix.

+1
source

All Articles