Currently, there are no asymptotic barrier properties of this particular multiplication.
The obvious optimization is to use product symmetry. That is, the entry [i][j] th is equal to the entry [j][i] th.
For implementation-oriented optimization, there is a significant amount of caching that you can do. A very significant amount of time when multiplying large matrices is spent on transferring data to and from the CPU and back. Thus, the processor developers implemented an intelligent caching system in which recently used memory is stored in a small section of memory called the cache. In addition to this, they also made nearby cached memory. This is due to the fact that most of the IO memory is associated with reading / writing from / to arrays, which are stored sequentially.
Since transposing a matrix is ββjust the same matrix with changing indexes, caching a value in a matrix can have more than twice as much impact.
tskuzzy
source share