You will understand this better if you draw an image of your array in memory:
Y -> X xxxxx ... | xxxxx v xxxxx . .
Access to your address will be linear in the Y direction (345, 345 + 1, 345 + 2 ...), but will jump wildly in the X direction if Y is large (345, 345 + X, 345 + X * 2). Since the cache loads memory blocks, you will very soon jump out of them with Y large enough, but you will always be on the cache page when you go in the Y direction until the cache is updated.
Also note that this effect can be more extreme when using dynamic allocation. Using the following program with full optimization gives me the following result (once in seconds)
0.615000 9.878000
EDIT: Other interesting measures:
Replacing the array code with int array[X][Y]; will use the limited stack memory, so you cannot test much larger X / Y values, but also very fast:
0.000000 0.000000
Using int array[X][Y]; will use the heap memory block as a global variable and again will be slow. Thus, even without dynamic allocation, the first case is much better:
0.929000 8.944000
Using X = 1500, Y = 1500 shows that the effect is measured even with smaller arrays:
0.008000 0.059000
EDIT2: Also note that there are other possible code optimizations that jalf said in a comment on your question. Using this optimization really almost doubles the speed (0.453 seconds for X = Y = 10000):
// an even faster way to access the array for (int x = 0; x < X; x++) { int* arrayptr = array[x]; for (int y = 0; y < Y; y++, arrayptr++) *arrayptr = x; }
Code: (note that you can also use this to measure your case where the difference should not be so extreme, except for the big X and Y. Like the others already mentioned, measure this and you will be enlightened).
#include <stdio.h>