From the built-in build, you can use the rdtsc command to get the 32-bit (least significant) counter to eax and the 32-bit (most significant) counter for edx. If your code is too small, you can check for fully consistent processor cycles with only the eax register. If the counter is greater than max. 32-bit value, edx increments for each loop with a maximum 32-bit value.
int cpu_clk1a=0; int cpu_clk1b=0; int cpu_clk2a=0; int cpu_clk2b=0; int max=0; std::cin>>max; //loop limit __asm { push eax push edx rdtsc //gets current cpu-clock-counter into eax&edx mov [cpu_clk1a],eax mov [cpu_clk1b],edx pop edx pop eax } long temp=0; for(int i=0;i<max;i++) { temp+=clock();//needed to defy optimization to actually measure something //even the smartest compiler cannot know what //the clock would be } __asm { push eax push edx rdtsc //gets current cpu-clock-counter into aex&edx mov [cpu_clk2a],eax mov [cpu_clk2b],edx pop edx pop eax } std::cout<<(cpu_clk2a-cpu_clk1a)<<std::endl; //if your loop takes more than ~2billions of cpu-clocks, use cpu_clk1b and 2b getchar(); getchar();
Output: 74000 processor cycles for 1000 iterations and 800000 processor cycles for 10,000 iterations on my machine. Since clock () is time consuming.
Cycle resolution on my machine: ~ 1000 cycles. Yes, you need more than a few thousand addition / subtraction (quick instructions) to measure them relatively correctly.
Assuming that the processor operating frequency is constant, 1000 cpu cycles are almost equal to 1 microsecond for a 1 GHz processor. You must warm your processor before doing this.
huseyin tugrul buyukisik
source share