I am compiling some scientific code in Fortran 77 and I have a discussion about what will be faster.
Basically, I have a matrix MxN, let's call it A. M is greater than N. Later in the code I need to multiply the transpose (A) by a bunch of vectors.
My question is, would it be faster to take A, transfer it yourself and save it, or when I call BLAS, just give it a transpose flag?
Thanks! -Patrick
source share