I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:
import numpy as np C = np.dot(A, B)
However, I know that for matrix B only row n and column n are nonzero (this comes directly from the analytical formula that created the matrix and, no doubt, always holds).
Hoping to take advantage of this fact and reduce the number of multiplications needed to create C, I replaced the above:
import numpy as np for row in range(0, N): for col in range(0, N): if col != n: C[row, col] = A[row, n]*B[n, col]
Analytically, this should reduce the overall complexity as follows: in the general case (without using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O (N ^ 3), that is, all N rows must multiply all N columns, and each of these point products contains N multiplications => O (NNN) = O (N ^ 3). #
Using structure B, as I did above, should go like O (N ^ 2 + N ^ 2) = O (2N ^ 2) = O (N ^ 2). That is, all N rows must multiply all N columns, but all of them (except those related to "B [:, n]") require only one scalar multiplication: only one element from "B [:, m] "is nonzero for m! = n. When n == m, which will occur N times (once for each row A that should multiply a column n of B), N scalar multiplications must be performed. #
However, the first block of code (using np.dot (A, B)) is much faster. I know (with the help of information such as: Why is matrix multiplication faster with numpy than with ctypes in Python? ) That the low-level implementation details of np.dot are to blame. So my question is this: how can I use the structure of matrix B to increase the efficiency of multiplication without sacrificing the efficiency of the NumPy implementation , without creating my own multiplication of the low level matrix in c ?
This method is part of numerical optimization in many variables, so O (N ^ 3) is unsolvable, while O (N ^ 2) is likely to do its job.
Thanks for any help. Also, I'm new to SO, so please apologize for any newbie errors.