The word you are looking for is shredding . Searching for matrix multiplication markup on Google gives more results .
The standard multiplication algorithm for c = a * b will look like
void multiply(double[,] a, double[,] b, double[,] c) { for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) C[i, j] += a[i, k] * b[k, j]; }
Basically, moving memory quickly in large steps is bad for performance. The access pattern for k in B [ k , j] does just that. Therefore, instead of jumping in memory, we can reorder the operations so that most of the internal loops work only with the second matrix access index:
void multiply(double[,] a, double[,] B, double[,] c) { for (i = 0; i < n; i++) { double t = a[i, 0]; for (int j = 0; j < n; j++) c[i, j] = t * b[0, j]; for (int k = 1; k < n; k++) { double s = 0; for (int j = 0; j < n; j++ ) s += a[i, k] * b[k, j]; c[i, j] = s; } } }
This was the example given on this page. However, another option is to copy the contents of B [k, *] to the array in advance and use this array in internal loop calculations. This approach is usually much faster than alternatives, even if it involves copying data. Even if it may seem contradictory, try it yourself.
void multiply(double[,] a, double[,] b, double[,] c) { double[] Bcolj = new double[n]; for (int j = 0; j < n; j++) { for (int k = 0; k < n; k++) Bcolj[k] = b[k, j]; for (int i = 0; i < n; i++) { double s = 0; for (int k = 0; k < n; k++) s += a[i,k] * Bcolj[k]; c[j, i] = s; } } }