Is it preferable to first access the first dimension than to access the second size of an array with 2 dimensions?

Question

Is it preferable to first access the first dimension than to access the second size of an array with 2 dimensions?

Here is the code

int array[X][Y] = {0,}; // 1 way to access the data for (int x = 0; x < X; x++) for(int y = 0; y < Y; y++) array[x][y] = compute(); // the other way to access the data for (int y = 0; y < Y; y++) for (int x = 0; x < X; x++) array[x][y] = compute();

Is it true that the first method is more efficient than the second, because optimizing the CPU cache (L1, L2?)? In other words, is a sequential access pattern preferable even for RAM?

+4

c ++ c

Thomson 25 sept. '10 at 11:39

source share

5 answers

Yes. Especially if the line is placed in the cache line. If you used the second method, and there were a sufficiently large number of lines in your array, there is no cache locality, and the cache lines will be constantly beaten.

+3

Jeff mercado 25 sept. '10 at 11:56

source share

Yes, the first is faster. The memory matrix stores row after row (main row), so there is a high probability that neighboring elements will be on one page in virtual memory (entire pages will be delivered to the cache, so access time is less).

Another approach would be to generate much more cache misses for a larger matrix.

+2

Klark 25 sept. '10 at 11:52

source share

Measure it.

Serial access is provided. It should depend to a large extent on the values of X and Y. For certain variants of X and Y, I would expect that the difference would be significant.

You should use a container such as a vector, valarray or boost :: matrix. C style matrices can prevent and annoy errors.

+2

Gabriel schreiber 25 sept. '10 at 11:55

source share

Well-known expression: "Probably, this would not change the noticeable difference due to the speed of modern computers."

-1

Alexander Rafferty 25 sept. '10 at 11:49

source share

schnaader · Accepted Answer · 2010-09-25T12:01:23+0000

You will understand this better if you draw an image of your array in memory:

  Y -> X xxxxx ... | xxxxx v xxxxx . .

Access to your address will be linear in the Y direction (345, 345 + 1, 345 + 2 ...), but will jump wildly in the X direction if Y is large (345, 345 + X, 345 + X * 2). Since the cache loads memory blocks, you will very soon jump out of them with Y large enough, but you will always be on the cache page when you go in the Y direction until the cache is updated.

Also note that this effect can be more extreme when using dynamic allocation. Using the following program with full optimization gives me the following result (once in seconds)

 0.615000 9.878000

EDIT: Other interesting measures:

Replacing the array code with int array[X][Y]; will use the limited stack memory, so you cannot test much larger X / Y values, but also very fast:

 0.000000 0.000000

Using int array[X][Y]; will use the heap memory block as a global variable and again will be slow. Thus, even without dynamic allocation, the first case is much better:

 0.929000 8.944000

Using X = 1500, Y = 1500 shows that the effect is measured even with smaller arrays:

 0.008000 0.059000

EDIT2: Also note that there are other possible code optimizations that jalf said in a comment on your question. Using this optimization really almost doubles the speed (0.453 seconds for X = Y = 10000):

 // an even faster way to access the array for (int x = 0; x < X; x++) { int* arrayptr = array[x]; for (int y = 0; y < Y; y++, arrayptr++) *arrayptr = x; }

Code: (note that you can also use this to measure your case where the difference should not be so extreme, except for the big X and Y. Like the others already mentioned, measure this and you will be enlightened).

 #include <stdio.h> #include <stdlib.h> #include <time.h> #define X 10000 #define Y 10000 int main() { int** array = new int*[X]; for (int x = 0; x < X; x++) { array[x] = new int[Y]; } double c = clock(); // 1 way to access the data for (int x = 0; x < X; x++) for(int y = 0; y < Y; y++) array[x][y] = x; printf("%f\n", (clock() - c) / CLOCKS_PER_SEC); c = clock(); // the other way to access the data for (int y = 0; y < Y; y++) for (int x = 0; x < X; x++) array[x][y] = x; printf("%f\n", (clock() - c) / CLOCKS_PER_SEC); for (int x = 0; x < X; x++) { delete(array[x]); } delete(array); }

Is it preferable to first access the first dimension than to access the second size of an array with 2 dimensions?

More articles: