Java - function callback reduces execution time

I have the following code

public class BenchMark { public static void main(String args[]) { doLinear(); doLinear(); doLinear(); doLinear(); } private static void doParallel() { IntStream range = IntStream.range(1, 6).parallel(); long startTime = System.nanoTime(); int reduce = range .reduce((a, item) -> a * item).getAsInt(); long endTime = System.nanoTime(); System.out.println("parallel: " +reduce + " -- Time: " + (endTime - startTime)); } private static void doLinear() { IntStream range = IntStream.range(1, 6); long startTime = System.nanoTime(); int reduce = range .reduce((a, item) -> a * item).getAsInt(); long endTime = System.nanoTime(); System.out.println("linear: " +reduce + " -- Time: " + (endTime - startTime)); } } 

I tried to compare threads, but after that the runtime was constantly decreasing, calling the same function again and again

Exit:

 linear: 120 -- Time: 57008226 linear: 120 -- Time: 23202 linear: 120 -- Time: 17192 linear: 120 -- Time: 17802 Process finished with exit code 0 

There is a huge huge difference between the first and second runtimes .

I'm sure the JVM can do some tricks backstage, but can someone help me understand what is really going on there?

Anyway, to avoid this optimization so that I can check the true runtime?

+7
performance java-8 jvm java-stream
source share
4 answers

I'm sure the JVM can do some tricks backstage, but can someone help me understand what is really going on there?

  • The massive latency of the first call is caused by the initialization of the full lambda run-time subsystem. You pay only once for the entire application.

  • The first time your code reaches any given lambda expression, you pay to bind that lambda (initializing the invokedynamic call invokedynamic ).

  • After several iterations, you will see additional acceleration due to the JIT compiler optimizing your recovery code.

Anyway, to avoid this optimization so that I can check the true runtime?

You are asking for a contradiction here: the "true" runtime is the one you get after a warm-up, when all optimizations have been applied. This is the runtime that the actual application will experience. The delay in the first few runs is not relevant to the wider picture if you are not interested in one-shot performance.

For research, you can see how your code behaves with JIT compilation disabled: pass -Xint to the java command. There are many more flags that can disable various aspects of optimization.

+7
source share

UPDATE: see @Marko answer for an explanation of the initial delay due to lambda connection.


The higher execution time for the first call is probably the result of the JIT effect . In short, compiling JIT byte codes into native machine code occurs the first time you call your method. The JVM then attempts to continue the optimization by identifying frequently called (hot) methods and re-generating its codes to improve performance.

Anyway, to avoid this optimization so that I can check the true runtime?

You can probably explain the initial JVM workout by excluding the first few results. Then increase the number of repeated calls to your method in a loop of tens of thousands of iterations and compare the results.

There are a few more options you might want to add to your performance to reduce noise, as discussed in this post. There are also helpful tips from this post .

+3
source share

true lead time

There is no such thing as a β€œtrue lead time”. If you need to solve this problem only once, the true runtime will be the time of the first test (along with the time to start the JVM itself). In general, the time taken to execute this part of the code depends on many things:

  • Whether this piece of code is interpreted is JIT-compiled using the C1 or C2 compiler. Please note that there are more than three options. If you call one method from another, one of them can be interpreted, and the other can be compiled C2.

  • For the C2 compiler: how this code was executed earlier, so in the branch profile and type. A contaminated profile can significantly reduce performance.

  • Garbage collector state: whether it interrupts execution or does not execute

  • Compilation queue: the JIT compiler compiles another code at the same time (which may slow down the execution of the current code)

  • Memory scheme: how objects are located in memory, how many cache lines must be loaded to access all the necessary data.

  • Processor prediction state, which depends on previous code execution and may increase or decrease the number of incorrect branch predictions.

And so on and so forth. Therefore, even if you measure something in an isolated controller, this does not mean that the speed of the same code during production will be the same. It may vary in order of magnitude. Therefore, before you measure something, you must ask yourself why you want to measure this thing. Usually you don't care how long a part of your program takes. Usually you like the latency and bandwidth of the entire program. So profile the entire program and optimize the slowest parts. Perhaps the thing you are measuring is not the slowest.

+3
source share

The Java VM loads the class into memory for the first time the class is used. Thus, the difference between the 1st and 2nd runs can be caused by loading classes.

+1
source share

All Articles