This is due to the locale of the link . If you access the elements in the same order that they are stored in memory, it will be much faster than accessing them using the template, as memory caching and memory bandwidth will be used much more efficiently.
The above explains that the second version is faster than the first, and this is exactly what happens in my box:
aix@aix :~$ time ./ver1 real 0m29.421s aix@aix :~$ time ./ver2 real 0m2.198s
Here is the code I'm using to allocate an array:
double a = 0.5; int width = 2048; int height = 2048; double* data = new double[height * width]; double** image = new double*[height]; for (int i = 0; i < height; i++) { image[i] = data + i * width; }
Version 1 time in the following cycle:
for (int iter = 0; iter < 100; iter++) { for(int w=0; w<width; w++) { for(int h=1; h<height; h++) { image[h][w] = (1-a)*image[h][w] + a*image[h-1][w]; } } }
Version 2:
for (int iter = 0; iter < 100; iter++) { for(int h=0; h<height; h++) { for(int w=1; w<width; w++) { image[h][w] = (1-a)*image[h][w] + a*image[h][w-1]; } } }
Compiled with g++ 4.4.3 with -O3 and starts in the Xeon field of some description (64-bit Ubuntu).
If you are still 100% sure that you are seeing the effect on the contrary , there should be something fundamentally different in relation to what you are doing compared to what I am doing. This can help if you tell us about the size of your image and how it will be allocated (to help establish a memory layout).
source share