Is it preferable to first access the first dimension than to access the second size of an array with 2 dimensions?

Here is the code

int array[X][Y] = {0,}; // 1 way to access the data for (int x = 0; x < X; x++) for(int y = 0; y < Y; y++) array[x][y] = compute(); // the other way to access the data for (int y = 0; y < Y; y++) for (int x = 0; x < X; x++) array[x][y] = compute(); 

Is it true that the first method is more efficient than the second, because optimizing the CPU cache (L1, L2?)? In other words, is a sequential access pattern preferable even for RAM?

+4
source share
5 answers

You will understand this better if you draw an image of your array in memory:

  Y -> X xxxxx ... | xxxxx v xxxxx . . 

Access to your address will be linear in the Y direction (345, 345 + 1, 345 + 2 ...), but will jump wildly in the X direction if Y is large (345, 345 + X, 345 + X * 2). Since the cache loads memory blocks, you will very soon jump out of them with Y large enough, but you will always be on the cache page when you go in the Y direction until the cache is updated.

Also note that this effect can be more extreme when using dynamic allocation. Using the following program with full optimization gives me the following result (once in seconds)

 0.615000 9.878000 

EDIT: Other interesting measures:

Replacing the array code with int array[X][Y]; will use the limited stack memory, so you cannot test much larger X / Y values, but also very fast:

 0.000000 0.000000 

Using int array[X][Y]; will use the heap memory block as a global variable and again will be slow. Thus, even without dynamic allocation, the first case is much better:

 0.929000 8.944000 

Using X = 1500, Y = 1500 shows that the effect is measured even with smaller arrays:

 0.008000 0.059000 

EDIT2: Also note that there are other possible code optimizations that jalf said in a comment on your question. Using this optimization really almost doubles the speed (0.453 seconds for X = Y = 10000):

 // an even faster way to access the array for (int x = 0; x < X; x++) { int* arrayptr = array[x]; for (int y = 0; y < Y; y++, arrayptr++) *arrayptr = x; } 

Code: (note that you can also use this to measure your case where the difference should not be so extreme, except for the big X and Y. Like the others already mentioned, measure this and you will be enlightened).

 #include <stdio.h> #include <stdlib.h> #include <time.h> #define X 10000 #define Y 10000 int main() { int** array = new int*[X]; for (int x = 0; x < X; x++) { array[x] = new int[Y]; } double c = clock(); // 1 way to access the data for (int x = 0; x < X; x++) for(int y = 0; y < Y; y++) array[x][y] = x; printf("%f\n", (clock() - c) / CLOCKS_PER_SEC); c = clock(); // the other way to access the data for (int y = 0; y < Y; y++) for (int x = 0; x < X; x++) array[x][y] = x; printf("%f\n", (clock() - c) / CLOCKS_PER_SEC); for (int x = 0; x < X; x++) { delete(array[x]); } delete(array); } 
+5
source

Yes. Especially if the line is placed in the cache line. If you used the second method, and there were a sufficiently large number of lines in your array, there is no cache locality, and the cache lines will be constantly beaten.

+3
source

Yes, the first is faster. The memory matrix stores row after row (main row), so there is a high probability that neighboring elements will be on one page in virtual memory (entire pages will be delivered to the cache, so access time is less).

Another approach would be to generate much more cache misses for a larger matrix.

+2
source

Measure it.

Serial access is provided. It should depend to a large extent on the values โ€‹โ€‹of X and Y. For certain variants of X and Y, I would expect that the difference would be significant.

You should use a container such as a vector, valarray or boost :: matrix. C style matrices can prevent and annoy errors.

+2
source

Well-known expression: "Probably, this would not change the noticeable difference due to the speed of modern computers."

-1
source

All Articles