How to compare the performance of two pieces of code

I have a friendly contest with a couple guys in the programming field, and lately we have become so interested in writing effective code. Our task was to try to optimize the code (in terms of processor time and complexity) at all costs (readability, reuse, etc.).

The problem is that now we need to compare our codes and see which approach is better than others, but we do not know any tools for this purpose.

My question is, are there any (any!) Tools that take a piece of code as input and calculate the number of flop or processor instructions needed to run it? Is there any tool for measuring code optimality?

PS The target language is C ++, but it would be nice to know if such tools exist for java.

+7
source share
7 answers

Here is a small stopwatch C ++ 11 I like to roll when I need something time:

#include <chrono> #include <ctime> template <typename T> class basic_stopwatch { typedef T clock; typename clock::time_point p; typename clock::duration d; public: void tick() { p = clock::now(); } void tock() { d += clock::now() - p; } void reset() { d = clock::duration::zero(); } template <typename S> unsigned long long int report() const { return std::chrono::duration_cast<S>(d).count(); } unsigned long long int report_ms() const { return report<std::chrono::milliseconds>(); } basic_stopwatch() : p(), d() { } }; struct c_clock { typedef std::clock_t time_point; typedef std::clock_t duration; static time_point now() { return std::clock(); } }; template <> unsigned long long int basic_stopwatch<c_clock>::report_ms() const { return 1000. * double(d) / double(CLOCKS_PER_SEC); } typedef basic_stopwatch<std::chrono::high_resolution_clock> stopwatch; typedef basic_stopwatch<c_clock> cstopwatch; 

Using:

 stopwatch sw; sw.tick(); run_long_code(); sw.tock(); std::cout << "This took " << sw.report_ms() << "ms.\n"; 

In any decent default implementation, high_resolution_clock must provide very accurate time information.

+8
source

There is a function std::clock() from <ctime> that returns how much time the CPU spent on the current process (this means that it does not take into account the time during which the program idled because the processor performed other tasks) . This function can be used to accurately measure the execution time of algorithms. Use the constant std::CLOCKS_PER_SEC (also from <ctime> ) to convert the return value to seconds.

+3
source

From the built-in build, you can use the rdtsc command to get the 32-bit (least significant) counter to eax and the 32-bit (most significant) counter for edx. If your code is too small, you can check for fully consistent processor cycles with only the eax register. If the counter is greater than max. 32-bit value, edx increments for each loop with a maximum 32-bit value.

 int cpu_clk1a=0; int cpu_clk1b=0; int cpu_clk2a=0; int cpu_clk2b=0; int max=0; std::cin>>max; //loop limit __asm { push eax push edx rdtsc //gets current cpu-clock-counter into eax&edx mov [cpu_clk1a],eax mov [cpu_clk1b],edx pop edx pop eax } long temp=0; for(int i=0;i<max;i++) { temp+=clock();//needed to defy optimization to actually measure something //even the smartest compiler cannot know what //the clock would be } __asm { push eax push edx rdtsc //gets current cpu-clock-counter into aex&edx mov [cpu_clk2a],eax mov [cpu_clk2b],edx pop edx pop eax } std::cout<<(cpu_clk2a-cpu_clk1a)<<std::endl; //if your loop takes more than ~2billions of cpu-clocks, use cpu_clk1b and 2b getchar(); getchar(); 

Output: 74000 processor cycles for 1000 iterations and 800000 processor cycles for 10,000 iterations on my machine. Since clock () is time consuming.

Cycle resolution on my machine: ~ 1000 cycles. Yes, you need more than a few thousand addition / subtraction (quick instructions) to measure them relatively correctly.

Assuming that the processor operating frequency is constant, 1000 cpu cycles are almost equal to 1 microsecond for a 1 GHz processor. You must warm your processor before doing this.

+1
source

It is difficult to calculate the number of processor time detail from a code block. The usual way to do this is to create the worst / average / best input as test cases. And do a temporary profiling based on your real code with these test cases. No tool can tell you about the flop when it does not contain detailed data and data entry conditions.

0
source

There are pieces of software called profilers that do exactly what you want.

An example for Windows is the AMD code analyzer and gprof for POSIX.

0
source

Best for your purposes valgrind / callgrind

0
source

Measuring the number of CPU instructions is pretty useless.

Performance versus bottleneck, depending on the problem, the bottleneck could be a network, disk IO, memory, or processor.

For a friendly competition, I would suggest time. This implies providing test cases that are large enough to have meaningful measures, of course.

On Unix, you can use gettimeofday for relatively accurate measurements.

0
source

All Articles