Programmatically Count Cache Errors

I need to evaluate the time that the C ++ function performs on the heap of the hypothesis about the efficiency of the memory hierarchy (for example, the time when we missed the cache, cache or page error when reading part of the array), so I would like to have several libraries that allow me to count cache miss errors to automatically generate a performance summary.

I know that there are some tools, such as cachegrind, that give some related statistics about the execution of a particular application, but I would like to get a library, as I said.

edit Oh, I forgot: I use Linux, and I'm not interested in portability, this is an academic thing.

Any suggestion is welcome!

+8
c ++ memory-management linux
source share
4 answers

It looks like now there is exactly what I was looking for: perf_event_open .

It allows you to do interesting things, such as initializing / enabling / disabling some performance counters for subsequent retrieval of their values โ€‹โ€‹through a unified and intuitive API (it gives you a special file descriptor that houses the structure containing the previously requested data).

This solution is for Linux only, and its functionality depends on the kernel version, so be careful :)

+4
source share

The latest processors (both AMD and Intel) have performance control registers that can be used for this kind of work. For Intel, they are described in the Programmers Guide, Volume 3B, Chapter 30. For AMD, this is in the BIOS and Kernel Developer's Guide.

In any case, you can read things like cache hits, cache misses, memory requests, preliminary data data, etc. They have rather specific selectors, so you can get the number (for example) of the number of reads in the L2 cache to fill lines in the L1 instruction cache (while still excluding reading L2 to fill lines in the L1 data cache).

There is a Linux kernel module for accessing MSR (model specific registers). Offline, I donโ€™t know if it gives access to performance monitor registers, but I expect it to be possible.

+5
source share

Intel VTune is a performance tuning tool that does exactly what you ask for; Of course, it works with Intel processors, as it refers to the internal processor counters, as explained by Jerry Coffin , so this probably does not work on the AMD processor. It exposes literally irregularities in counters such as cache hits / misses, branch prediction rates, etc. The real problem with it is understanding which counters should check;)

+3
source share

The disadvantages of the cache cannot be simply calculated. Most tools or profilers mimic memory access by redirecting memory accesses to a function that provides this function. This means that these tools tools encode code in all places where memory access is performed, and makes your code too slow. I think this is not what you intend.

However, depending on the equipment, you may have other options. But even if this is the case, the OS should support it (because otherwise you will get system global statistics, and not those associated with the process or thread)

EDIT: I could find this interesting article that can help you: http://lwn.net/Articles/417979/

+1
source share

All Articles