Linux primary command for cache references

I want to measure the cache throughput of my code. We can use the performance list to display supported events. My desktop is equipped with an Intel (R) Core (TM) i5-2400 processor with a processor with a frequency of 3.10 GHz, the primary list lists cache references and cache misses, for example:

cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] 

I think cache misses are mapped to LLC-misses hardware events according to the Intel architecture software developer guide (I confirm this by comparing perf stat -e r412e and perf stat -e cache-misses , they gave almost the same result) . But how are cache links counted? I did not find an event or a way to get general cache references using existing hardware events. So I wonder how accurate this cache link is on my computer?

+3
caching perf rate
source share
3 answers

At Intel, I don't think perf provides an event for measuring shared cache references, since such an event does not exist at the hardware level. You must calculate this information yourself using the hardware caching event specified by perf list :

 L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] L1-icache-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] 

Events not tagged with -misses represent the number of links in the associated cache.

Note: this previous question and this man page are about perf_event_open (used internally with perf).

+3
source share

If you look at arch / x86 / kernel / cpu / perf_event_intel.c in the kernel code. You will see that

 "PERF_COUNT_HW_CACHE_REFERENCES = 0x4f2e". 

Where

 "PERF_COUNT_HW_CACHE_MISSES= 0x412e" 

The X86 architectural guide says that 0x4f2e is "This event counts requests coming from the kernel that refer to a cache line in the last level cache." Therefore, I assume that this is correct.

+4
source share

I tried a tool called Intel's Vtune, I got some tips on how to measure shared cache links. They can measure microoperation codes and filter those commands that are loaded or saved to get common cache links. But I'm not sure if this tool also uses this method.

+1
source share

All Articles