Looking for the exact way to micro-test small code written in C ++ and running on Linux / OSX

Question

Looking for the exact way to micro-test small code written in C ++ and running on Linux / OSX

I want to make very simple micro-benchmarking of small code paths, such as the tight loops that I wrote in C ++. I work on Linux and OSX and use GCC. What tools exist for submillisecond accuracy? I think a simple test to run a code path many times (several tens of millions?) Will give me enough consistency to get a good read. If anyone knows about preferred methods, please feel free to suggest them.

+4

c ++ performance benchmarking linux

Silly kids Nov 10 '10 at 2:34

source share

4 answers

This is what I used in the past:

 inline double gettime () { timeval tv; gettimeofday (&tv, NULL); return double (tv.tv_sec) + 0.000001 * tv.tv_usec; }

And then:

 double startTime = gettime(); // your code here double runTime = gettime() - startTime;

This will indicate a microsecond.

0

chrisaycock Nov 10 '10 at 2:55

source share

Cachegrind / kCachegrind are good for very fine graining . I do not believe that they are available for OS X, but the results you get on Linux should be representative.

0

caf Nov 10 '10 at 3:12

source share

Microbenchmark should run the same code in a loop, preferably in many iterations. I used the following and ran it using the time (1) utility;

the following warnings were observed

if the test does not subtract, then the code is eliminated by optimization - gcc with -O3 does this.
test functions test () and lookup () should be implemented in a different source file than in the iteration loop; if they are in the same file, and the search function returns a constant value, then code optimization will not call it, and not just once, it will simply multiply the returned value by the number of iterations!

main.c file

 #include <stdio.h> #define RUN_COUNT 10000000 void init(); int lookup(); main() { int sum = 0; int i; init(); for(i = 0; i < RUN_COUNT; i++ ) { sum += lookup(); } printf("%d", sum ); }

0

MichaelMoser Dec 01 '13 at 21:34

source share

osgx · Accepted Answer · 2010-11-10T03:28:55+0000

You can use the processor instruction "rdtsc" on x86 / x86_64. For multi-core systems, check the possibility of "constant_tsc" in the CPUID (/ proc / cpuinfo in linux) - this will mean that all cores use the same tick counter, even with a change in the dynamic frequency and sleep.

If the processor does not support the taskset constant, be sure to bind the program to the kernel ( taskset utility on Linux).

When using rdtsc on processors out of order (everything except Intel Atom may be another low-performance processor), add the "arrange" command to, for example, "cpuid" - this temporarily prohibits reordering of commands.

In addition, MacOsX has a “Shark” that can measure some hardware events in your code.

RDTSC and inappropriate processor. Section 18 of this great guide to fogs (its main site is http://www.agner.org/optimize/ )

http://www.scribd.com/doc/1548519/optimizing-assembly

On all processors with non-standard execution, you need to insert XOR EAX, EAX / CPUID before and after each reading of the counter, to prevent its parallel execution with anything else. CPUID is a serialization instruction, which means that it resets and we wait for the completion of all pending operations before starting work. This is very useful for testing purposes.

Looking for the exact way to micro-test small code written in C ++ and running on Linux / OSX

More articles: