Performing memory operations on iPhone

Here is the code that I use to create a differently ordered array:

const unsigned int height = 1536; const unsigned int width = 2048; uint32_t* buffer1 = (uint32_t*)malloc(width * height * BPP); uint32_t* buffer2 = (uint32_t*)malloc(width * height * BPP); int i = 0; for (int x = 0; x < width; x++) for (int y = 0; y < height; y++) buffer1[x+y*width] = buffer2[i++]; 

Can someone explain why to use the following assignment:

 buffer1[i++] = buffer2[x+y*width]; 

instead, what in my code takes twice as long?

+4
source share
2 answers

Probably before the behavior of the processor cache (by 12 MB, your images far exceed the 256 KB L2 cache in the ARM Cortex A8, which is inside iphone3gs).

The first example accesses the read array in sequential order, which is fast, but it has to access the write array out of order, which is slow.

The second example - the opposite - the writing array is written in fast, sequential order, and the reading array is accessed more slowly. Write errors are obviously less expensive under this workload than reading misses.

Ulrich Drapper's article What every programmer needs to know about memory , it is recommended to read if you want to know more about it.

Note that if you include this operation in a function, you will help the optimizer generate more efficient code if you use the restrict qualifier for pointer arguments, for example:

 void reorder(uint32_t restrict *buffer1, uint32_t restrict *buffer2) { int i = 0; for (int x = 0; x < width; x++) for (int y = 0; y < height; y++) buffer1[x+y*width] = buffer2[i++]; } 

( restrict qualifier promises the compiler that the data pointed to by two pointers does not overlap - which in this case requires the function to make sense anyway).

+4
source

Each pixel access in the first one has a linear locality of the link , the second one deletes your cache at each reading in order to go to the main memory for each.

A processor can process records with poor locality much more efficiently than reads, if the write should go into the main memory, that the write may occur in parallel with another read / arithmetic operation. If the read skips the cache, it can completely stop the processor, waiting for more data to be filtered through the cache hierarchy.

+2
source

All Articles