You can use the processor instruction "rdtsc" on x86 / x86_64. For multi-core systems, check the possibility of "constant_tsc" in the CPUID (/ proc / cpuinfo in linux) - this will mean that all cores use the same tick counter, even with a change in the dynamic frequency and sleep.
If the processor does not support the taskset constant, be sure to bind the program to the kernel ( taskset utility on Linux).
When using rdtsc on processors out of order (everything except Intel Atom may be another low-performance processor), add the "arrange" command to, for example, "cpuid" - this temporarily prohibits reordering of commands.
In addition, MacOsX has a βSharkβ that can measure some hardware events in your code.
RDTSC and inappropriate processor. Section 18 of this great guide to fogs (its main site is http://www.agner.org/optimize/ )
http://www.scribd.com/doc/1548519/optimizing-assembly
On all processors with non-standard execution, you need to insert XOR EAX, EAX / CPUID before and after each reading of the counter, to prevent its parallel execution with anything else. CPUID is a serialization instruction, which means that it resets and we wait for the completion of all pending operations before starting work. This is very useful for testing purposes.
source share