Looking for the exact way to micro-test small code written in C ++ and running on Linux / OSX

I want to make very simple micro-benchmarking of small code paths, such as the tight loops that I wrote in C ++. I work on Linux and OSX and use GCC. What tools exist for submillisecond accuracy? I think a simple test to run a code path many times (several tens of millions?) Will give me enough consistency to get a good read. If anyone knows about preferred methods, please feel free to suggest them.

+4
source share
4 answers

You can use the processor instruction "rdtsc" on x86 / x86_64. For multi-core systems, check the possibility of "constant_tsc" in the CPUID (/ proc / cpuinfo in linux) - this will mean that all cores use the same tick counter, even with a change in the dynamic frequency and sleep.

If the processor does not support the taskset constant, be sure to bind the program to the kernel ( taskset utility on Linux).

When using rdtsc on processors out of order (everything except Intel Atom may be another low-performance processor), add the "arrange" command to, for example, "cpuid" - this temporarily prohibits reordering of commands.

In addition, MacOsX has a β€œShark” that can measure some hardware events in your code.

RDTSC and inappropriate processor. Section 18 of this great guide to fogs (its main site is http://www.agner.org/optimize/ )

http://www.scribd.com/doc/1548519/optimizing-assembly

On all processors with non-standard execution, you need to insert XOR EAX, EAX / CPUID before and after each reading of the counter, to prevent its parallel execution with anything else. CPUID is a serialization instruction, which means that it resets and we wait for the completion of all pending operations before starting work. This is very useful for testing purposes.

+5
source

This is what I used in the past:

 inline double gettime () { timeval tv; gettimeofday (&tv, NULL); return double (tv.tv_sec) + 0.000001 * tv.tv_usec; } 

And then:

 double startTime = gettime(); // your code here double runTime = gettime() - startTime; 

This will indicate a microsecond.

0
source

Cachegrind / kCachegrind are good for very fine graining . I do not believe that they are available for OS X, but the results you get on Linux should be representative.

0
source

Microbenchmark should run the same code in a loop, preferably in many iterations. I used the following and ran it using the time (1) utility;

the following warnings were observed

  • if the test does not subtract, then the code is eliminated by optimization - gcc with -O3 does this.

  • test functions test () and lookup () should be implemented in a different source file than in the iteration loop; if they are in the same file, and the search function returns a constant value, then code optimization will not call it, and not just once, it will simply multiply the returned value by the number of iterations!

main.c file

 #include <stdio.h> #define RUN_COUNT 10000000 void init(); int lookup(); main() { int sum = 0; int i; init(); for(i = 0; i < RUN_COUNT; i++ ) { sum += lookup(); } printf("%d", sum ); } 
0
source

All Articles