Why is cacheshrind not completely determined?

Inspired by SQLite , I look at the valgrind tool "cachegrind" to make reproducible benchmarking performance. The displayed numbers are much more stable than any other method of determining the time that I found, but they are still not deterministic. As an example, here is a simple C program:

int main() { volatile int x; while (x < 1000000) { x++; } } 

If I compile it and run it in cachegrind, I get the following results:

 $ gcc -O2 xc -ox $ valgrind --tool=cachegrind ./x ==11949== Cachegrind, a cache and branch-prediction profiler ==11949== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al. ==11949== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info ==11949== Command: ./x ==11949== --11949-- warning: L3 cache found, using its data for the LL simulation. ==11949== ==11949== I refs: 11,158,333 ==11949== I1 misses: 3,565 ==11949== LLi misses: 2,611 ==11949== I1 miss rate: 0.03% ==11949== LLi miss rate: 0.02% ==11949== ==11949== D refs: 4,116,700 (3,552,970 rd + 563,730 wr) ==11949== D1 misses: 21,119 ( 19,041 rd + 2,078 wr) ==11949== LLd misses: 7,487 ( 6,148 rd + 1,339 wr) ==11949== D1 miss rate: 0.5% ( 0.5% + 0.4% ) ==11949== LLd miss rate: 0.2% ( 0.2% + 0.2% ) ==11949== ==11949== LL refs: 24,684 ( 22,606 rd + 2,078 wr) ==11949== LL misses: 10,098 ( 8,759 rd + 1,339 wr) ==11949== LL miss rate: 0.1% ( 0.1% + 0.2% ) $ valgrind --tool=cachegrind ./x ==11982== Cachegrind, a cache and branch-prediction profiler ==11982== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al. ==11982== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info ==11982== Command: ./x ==11982== --11982-- warning: L3 cache found, using its data for the LL simulation. ==11982== ==11982== I refs: 11,159,225 ==11982== I1 misses: 3,611 ==11982== LLi misses: 2,611 ==11982== I1 miss rate: 0.03% ==11982== LLi miss rate: 0.02% ==11982== ==11982== D refs: 4,117,029 (3,553,176 rd + 563,853 wr) ==11982== D1 misses: 21,174 ( 19,090 rd + 2,084 wr) ==11982== LLd misses: 7,496 ( 6,154 rd + 1,342 wr) ==11982== D1 miss rate: 0.5% ( 0.5% + 0.4% ) ==11982== LLd miss rate: 0.2% ( 0.2% + 0.2% ) ==11982== ==11982== LL refs: 24,785 ( 22,701 rd + 2,084 wr) ==11982== LL misses: 10,107 ( 8,765 rd + 1,342 wr) ==11982== LL miss rate: 0.1% ( 0.1% + 0.2% ) $ 

In this case, β€œI refs” differs from just 0.008% between the two starts, but I still wonder why they are different. In more complex programs (tens of milliseconds) they can change more. Is there a way to make runs fully reproducible?

+6
source share
1 answer

At the end of the topic at gmane.comp.debugging.valgrind , Nicholas Nethercote (Mozilla developer, working on the valgrind dev team) says that minor variations are common using Cachegrind (and I can conclude that they will not lead to serious problems).

The Cachegrinds manual mentions that the program is very sensitive. For example, in Linux, randomization of address space (used to increase security) can be a source of non-determinism.

One more note - the results are very sensitive. Resizing the profile to be profiled or the size of any of the shared libraries that it uses, or even the length of their file names, may disrupt the results. Variations will be small, but do not expect completely repeatable results if your program changes at all.

Later GNU / Linux distributions randomize the space in which identical runs of the same program have their shared libraries loaded in different places, as a security measure. It also violates the results.

Although these factors mean that you should not trust the super-accurate, they should be close enough to be useful.

+5
source

All Articles