Turns out you burn two levels in numpy.linalg code. If you look at numpy.linalg.inv, you can see it simply as a call to numpy.linalg.solve (A, inv (A.shape [0]). This leads to the reconstruction of the identity matrix at each iteration of your loop. Since all your arrays are the same size, itβs a waste of time. Skipping this step by pre-allocating the identification matrix will hit ~ 20% of the time (fast_inverse). My testing assumes that pre-allocating the array or selecting it from the list of results does not really matter.
Look one level deeper and you will find the challenge of a routine program, but it wraps up a few sanity checks. If you cross out all of this and just call lapack in your for loop (since you already know the size of your matrix and maybe you know that it is real and not complicated), everything works much faster (note that I made my array more) :
import numpy as np A = np.random.rand(1000,3,3) def slow_inverse(A): Ainv = np.zeros_like(A) for i in range(A.shape[0]): Ainv[i] = np.linalg.inv(A[i]) return Ainv def fast_inverse(A): identity = np.identity(A.shape[2], dtype=A.dtype) Ainv = np.zeros_like(A) for i in range(A.shape[0]): Ainv[i] = np.linalg.solve(A[i], identity) return Ainv def fast_inverse2(A): identity = np.identity(A.shape[2], dtype=A.dtype) return array([np.linalg.solve(x, identity) for x in A]) from numpy.linalg import lapack_lite lapack_routine = lapack_lite.dgesv
The results are impressive:
20 loops, best of 3: 45.1 ms per loop 20 loops, best of 3: 38.1 ms per loop 20 loops, best of 3: 38.9 ms per loop 20 loops, best of 3: 13.8 ms per loop
EDIT: I did not look closely enough at what was returning in the decision process. It turns out that the matrix 'b' is overwritten and contains the result at the end. This code now gives consistent results.
source share