Efficient performance measurement

In this question, I would like to talk about how to test the performance of Java code. The usual approach works in the following directions:

long start = System.nanoTime(); for( int i=0; i<SOME_VERY_LARGE_NUMBER; i++) { ...do something... } long duration = System.nanoTime() - start; System.out.println( "Performance: " + new BigDecimal( duration ).divide( new BigDecimal( SOME_VERY_LARGE_NUMBER, 3, RoundingMode.HALF_UP ) ) ); 

The "optimized" versions transfer calls to System.nanoTime() in a loop, increasing the error field, since System.nanoTime() takes much more time (and is less predictable during execution) than i ++ and comparison.

My criticism is this:

This gives me an average runtime, but this value depends on factors that really don’t interest me: like loading the system during the test cycle or during the transition of JIT / GC.

Wouldn't this approach be (much) better in most cases?

  • Run the measurement code often enough to force JIT compilation
  • Run the code in a loop and measure the execution time. Remember the smallest values ​​and interrupt the cycle when this value stabilizes.

My explanation is that I usually want to know how fast some code can be (lower bounds). Any code can become arbitrarily slow using external events (mouse movements, interrupts from the video card, because you have an analog clock on your desktop, swapping, network packets ...), but most of the time I just want to know how fast my The code can be in ideal conditions.

It will also make the performance measurement much faster since I don’t need to run the code in seconds or minutes (to undo unwanted effects).

Can someone confirm / debunk this?

+4
source share
3 answers

I think what you offer is pretty reasonable, with some tweaks:

1) I would report a median or a bunch of percentiles, not a minimum. If your code puts a heavy load on the garbage collector, you can easily take the minimum without raising it (all that is required is that one iteration will correspond between two consecutive GC pauses).

2) In many cases, it makes sense to measure the processor time, rather than the time of the wall clock. This concerns some consequences of the fact that another code works in one window.

3) In some benchmarking tools, two levels of cycles are used: the inner loop repeatedly performs the operation, and the outer loop looks at the clock before and after the inner loop. The observations are then aggregated by iterations of the outer loop.

Finally, the following gives a very good overview of JVM-specific issues you need to know about: How to write the right micro-test in Java?

+3
source

You can use the -XX: CompileThreshold option to indicate when JIT starts. You can then “warm up” your test by running a loop higher than CompileThreshold before starting a synchronized loop.

+2
source

I would run the SOME_VERY_LARGE_NUMBER cycle 50 times and calculate the average value of the best cycle. This is what is usually done in other tests, and not just for micro tests.

I also argue that performance issues caused by GC are often part of the code. You probably shouldn't get the GC out of the equation, because a procedure that allocates a lot of memory will have to pay a certain price for it. The proposed approach takes into account the average cost of GC per call if you have dialed your SOME_VERY_LARGE_NUMBER enough.

About your proposal: all timers have limited accuracy, so it is quite possible that a short procedure ends with zero measures. This means that your algorithm will detect that the procedure starts at zero time. Which is clearly not right.

+1
source

Source: https://habr.com/ru/post/1411943/


All Articles