C - cache lines and association

Context

Read the cache optimization docs (linking to the cache line in loops).

The question is related to this context: an array of as many as 1024.

Sizes: cpu cache 64k, cache line 32 bytes, integer size: 4 bytes.

intel core 2 duo

Question

According to my processor, 8 integers fit into the cache line.

[0,1,2,3,4,5,6,7,8,9,10,...,1023] ^ If I want to access 4 and go downward, 3,2,1 and 0 will be loaded already. 5,6,7 are loaded uselessly. [0,1,2,3,4,5,6,7,8,..,1023] ^ If I want to access 7 and go downward, all the next elements will be in cache already. if I want to go upward, according to my cpu I will have to load another cache line immediatly after the arr[7] read. 

Am I right?

Further

But what tells me that arr [4] does not have an address that will lead to loading the cache line instead of arr [7]? If my statement is true, we must not only consider alignment in the array, but all alignment of the program memory to minimize cache waste, right?

+7
optimization arrays caching memory computer-architecture
source share
2 answers

Regarding your main question, yes, in both cases you are right.

In the second case, when arr[7] loads and can continue upward, you should remember that perhaps either the compiler or some kind of prefetching mechanism takes into account the spatial locality of this kind of data, thereby improving performance.

Further, indeed, reading some other address in the array can lead to loading the cache line instead of arr[7] if the array is incorrectly aligned in memory, but in this case the alignment does not depend on you, but up to the compiler.

+2
source share

But what tells me that arr [4] does not have an address that will lead to loading the cache line instead of arr [7]?

int arrays are usually aligned at 4 byte boundaries (assuming int is 32 bits and 8 bits), so you won’t know where the border of the cache border will be.

The lesson to be learned is that you should not worry that the random cached line is lost (when using 2 cache lines, even if the required data is less than 32 bytes), because it is mainly from your hands when encoding in C.

What can you worry if you have performance issues by choosing algorithms that reduce cache misses.

A typical example is a loop:

 int array[N][M]; // Assume N * M * sizeof (int) is much larger than the cache. // Example 1 for (i=0; i<N; i++) { for (j=0; j<M; j++) { <do something with array[i][j]> } } // Example 2 int array[N][M]; for (j=0; j<M; j++) { for (i=0; i<N; i++) { <do something with array[i][j]> } } 

One example will give about 8 times more misses for the cache, since it accesses the elements in the wrong order.

+3
source share

All Articles