The problem is not that X and XT are representations of the same memory space as such, but rather that XT is F-adjacent and not C-adjacent. Of course, this must necessarily be true for at least one of the input arrays in the case where you multiply the array in order to transpose it.
At numpy <1.8, np.dot will create a C-ordered copy of any F-ordered input arrays, not just those that are views of the same memory block.
For instance:
X = np.random.randn(1000,50000) Y = np.random.randn(50000, 100)
If copying is a problem (for example, when X very large), what can you do with it?
The best option is probably to upgrade to a newer version of numpy, as @perimosocordiae points out, this performance issue was addressed in this pull request .
If for some reason you cannot update numpy, there is also a trick that allows you to execute fast, BLAS-based point products without having to copy, by calling the corresponding BLAS function directly through scipy.linalg.blas (shamelessly stolen from this answer ):
from scipy.linalg import blas X = np.random.randn(1000,50000) %memit res1 = np.dot(X, XT)
source share