Why does X.dot (XT) require so much memory in numpy?

Question

Why does X.dot (XT) require so much memory in numpy?

X is the matrix nxp, where p is much larger than n. Let say n = 1000 and p = 500000. When I run:

X = np.random.randn(1000,500000) S = X.dot(XT)

This operation ends up taking up most of the memory, despite the fact that the result is 1000 x 1000. The memory usage is returned after the operation is completed. Is there any way around this?

+6

python numpy scipy linear-algebra

bluecat Jan 18 '14 at 18:51

source share

1 answer

ali_m · Accepted Answer · 2014-01-19T03:01:35+0000

The problem is not that X and XT are representations of the same memory space as such, but rather that XT is F-adjacent and not C-adjacent. Of course, this must necessarily be true for at least one of the input arrays in the case where you multiply the array in order to transpose it.

At numpy <1.8, np.dot will create a C-ordered copy of any F-ordered input arrays, not just those that are views of the same memory block.

For instance:

 X = np.random.randn(1000,50000) Y = np.random.randn(50000, 100) # X and Y are both C-order, no copy %memit np.dot(X, Y) # maximum of 1: 485.554688 MB per loop # make X Fortran order and Y C-order, now the larger array (X) gets # copied X = np.asfortranarray(X) %memit np.dot(X, Y) # maximum of 1: 867.070312 MB per loop # make X C-order and Y Fortran order, now the smaller array (Y) gets # copied X = np.ascontiguousarray(X) Y = np.asfortranarray(Y) %memit np.dot(X, Y) # maximum of 1: 523.792969 MB per loop # make both of them F-ordered, both get copied! X = np.asfortranarray(X) %memit np.dot(X, Y) # maximum of 1: 905.093750 MB per loop

If copying is a problem (for example, when X very large), what can you do with it?

The best option is probably to upgrade to a newer version of numpy, as @perimosocordiae points out, this performance issue was addressed in this pull request .

If for some reason you cannot update numpy, there is also a trick that allows you to execute fast, BLAS-based point products without having to copy, by calling the corresponding BLAS function directly through scipy.linalg.blas (shamelessly stolen from this answer ):

 from scipy.linalg import blas X = np.random.randn(1000,50000) %memit res1 = np.dot(X, XT) # maximum of 1: 845.367188 MB per loop %memit res2 = blas.dgemm(alpha=1., a=XT, b=XT, trans_a=True) # maximum of 1: 471.656250 MB per loop print np.all(res1 == res2) # True

Why does X.dot (XT) require so much memory in numpy?

More articles: