In an article by Ulrich Drapper What every programmer should know about memory , Part 3: CPU caches, it shows a graph that shows the relationship between the size of the "working set" and the CPU cycle consumed per operation (in this case, sequential reading). And on the graph there are two leaps that indicate the size of the L1 cache and the L2 cache. I wrote my own program to reproduce the effect in c. It just sequentially reads the int [] array from head to tail, and I tried a different array size (from 1 KB to 1 MB). I am building data on a graph, and there is no jump, the graph is a straight line.
My questions:
- Is there something wrong with my method? How to create the effect of the processor cache (to see the jumps).
- I thought if this is a sequential read, then it should work like this: When reading the first element, it is not in the cache, and there will be hits within the size of the cache line (64 KB). By prefetching, the reading delay of the next cache line will be hidden. It will continuously read data into the L1 cache, even if the size of the working set exceeds the size of the L1 cache, it will exclude the least used recently and continue prefetching. Thus, most errors in the cache will be hidden, and the time taken to fetch data from L2 will be hidden behind reading, that is, they work simultaneously. associativity (in my case, 8 ways) hides the delay in reading data from L2. So, the phenomenon of my program must be correct, am I missing something?
- Is it possible to get the same effect in Java?
By the way, I am doing this on Linux.
Change 1
Thanks for Steven C.'s suggestion. Here's some additional info: This is my code:
int *arrayInt; void initInt(long len) { int i; arrayInt = (int *)malloc(len * sizeof(int)); memset(arrayInt, 0, len * sizeof(int)); } long sreadInt(long len) { int sum = 0; struct timespec tsStart, tsEnd; initInt(len); clock_gettime(CLOCK_REALTIME, &tsStart); for(i = 0; i < len; i++) { sum += arrayInt[i]; } clock_gettime(CLOCK_REALTIME, &tsEnd); free(arrayInt); return (tsEnd.tv_nsec - tsStart.tv_nsec) / len; }
In the main () function, I tried from 1 KB to 100 MB of array size, all the same, the average time spent on an element is 2 nanoseconds. I think this is access time to L1d.
My cache size:
L1d == 32 thousand
L2 == 256 thousand
L3 == 6144 KB
UPDATE 2
I changed my code to use a linked list.
// element type struct l { struct l *n; long int pad[NPAD]; // the NPAD could be changed, in my case I set it to 1 }; struct l *array; long globalSum; // for init the array void init(long len) { long i, j; struct l *ptr; array = (struct l*)malloc(sizeof(struct l)); ptr = array; for(j = 0; j < NPAD; j++) { ptr->pad[j] = j; } ptr->n = NULL; for(i = 1; i < len; i++) { ptr->n = (struct l*)malloc(sizeof(struct l)); ptr = ptr->n; for(j = 0; j < NPAD; j++) { ptr->pad[j] = i + j; } ptr->n = NULL; } } // for free the array when operation is done void release() { struct l *ptr = array; struct l *tmp = NULL; while(ptr) { tmp = ptr; ptr = ptr->n; free(tmp); } } double sread(long len) { int i; long sum = 0; struct l *ptr; struct timespec tsStart, tsEnd; init(len); ptr = array; clock_gettime(CLOCK_REALTIME, &tsStart); while(ptr) { for(i = 0; i < NPAD; i++) { sum += ptr->pad[i]; } ptr = ptr->n; } clock_gettime(CLOCK_REALTIME, &tsEnd); release(); globalSum += sum; return (double)(tsEnd.tv_nsec - tsStart.tv_nsec) / (double)len; }
Finally, I will print globalSum to avoid compiler optimization. As you can see, this is still sequential reading, I even tried up to 500 MB of array size, the average time per element is about 4 nanoseconds (perhaps because it needs to access the data panel and pointer) n ', two accesses ), the same as 1 KB of array size. So, I think this is because cache optimization, such as prefetching, really hides the delay, am I right? I will try random access and publish the results later.
UPDATE 3
I tried random access to a linked list, here is the result: 
the first red line is my L1 cache size, the second is L2. So we can see a little jump there. And sometimes the wait time is still well hidden.