I have an Intel Core IvyBridge processor, Intel (R) Core (TM) i7-3770 CPU @ 3.40 GHz (L1-32KB, L2-256KB, L3-8MB). I know that L3 is included and shared between several cores. I want to know the following regarding my system
PART 1:
- Is L1 inclusive or exclusive?
- Is L2 inclusive or exclusive?
PART 2:
If L1 and L2 are both turned on, in order to find the access time for L2, first declare an array (1 MB) larger than the L2 cache (256 KB), and then start accessing the entire array to load into the L2 cache. After that, we get access to the array element from the beginning to the end index in increments of 64B, since the size of the cache line is 64B. To get a more accurate result, we repeat this process (referring to the array elements in the index, starting from the end) several times, say, 1 million times and take the average value.
My understanding of why this approach gives the correct result is as follows: When we access an array larger than the size of the L2 cache, the entire array is loaded from the main memory into L3, then from L3 to L2, then from L2 to L1. The last 32 Kbytes of the entire array are in L1, as they have been accessed recently. The entire array is also present in L2 and L3 caches also due to the inclusive property and cache coherency. Now, when I again access the array from the starting index, which is not in the L1 cache, but in the L2 cache, so the cache will be skipped and it will be loaded from the L2 cache. And thus the access time for all elements of the entire array will be increased, and in the end I will get the total access time of the entire array. To get single access, I get the average of the total lack of access.
- ?
.