How can jitter jitter be caused by a for loop without allocating objects?

I was micro-benchmarking the following code, and I noticed something interesting that I hope someone can shed some light on. This leads to a situation where it looks like a for loop can continue to work quickly, blocking other threads in the JVM. If this is so, then I would like to understand why, if this is not so, then some understanding of what I may lose will be clear.

To create a situation, let me guide you through the benchmark that I run and its results.

The code is pretty simple, iterating over each element of the array, summing up its contents. Repeat "targetCount" times.

public class UncontendedByteArrayReadBM extends Benchmark { private int arraySize; private byte[] array; public UncontendedByteArrayReadBM( int arraySize ) { super( "array reads" ); this.arraySize = arraySize; } @Override public void setUp() { super.setUp(); array = new byte[arraySize]; } @Override public void tearDown() { array = null; } @Override public BenchmarkResult invoke( int targetCount ) { long sum = 0; for ( int i=0; i<targetCount; i++ ) { for ( int j=0; j<arraySize; j++ ) { sum += array[j]; } } return new BenchmarkResult( ((long)targetCount)*arraySize, "uncontended byte array reads", sum ); } } 

My 4-core Intel Sandy Bridged / i7 laptop with 2 cores runs Java 6 (Oracle JVM) on OSX. This code runs several times until

2.626852686364034 invalid byte array reads / ns [totalTestRun = 3806.837ms]

(I selected the repeated runs used to heat the JVM)

This result seems reasonable to me.

Where it got interesting when I started measuring JVM Jitter. To do this, I start the phonon daemon thread, which sleeps for 1 ms, and then finds out how much longer than 1 ms it really slept. And I modified the report to print the maximum jitter for each repeated test run.

2.6109858273078306 invalid byte array reads / ns [maxJitter = 0.411ms totalTestRun = 3829.971ms]

To get an idea of ​​the “normal” jitter for my environment, before starting the actual test runs, I track jitter without any work, and such indicators as, for example, the following (all in ms). Thus, a 0.411 ms jitter is normal and not very interesting.

 getMaxJitterMillis() = 0.599 getMaxJitterMillis() = 0.37 getMaxJitterMillis() = 0.352 

I included the code for how I measured Jitter at the end of this question.

The interesting part, however, yes, this happens during the "JVM warm-up" and therefore is not "normal", but I would like to understand the following in more detail:

 2.4519521584902644 uncontended byte array reads/ns [maxJitter=2561.222ms totalTestRun=4078.383ms] 

Please note that jitter is more than 2.5 seconds. Normally I would put this on a GC. However, I ran System.gc () before running the test, AND -XX: + PrintGCDetails does not yet show GC. Infact there is no GC during any of the test runs, since next to it there is no distribution of objects in this test of summing pre-allocated bytes. This also happens every time I run a new test, and therefore I do not suspect that it is interfering with some other process that happens randomly.

My curiosity took off, because when I noticed that when the jitter is very long, the total duration of the work and, indeed, the number of readable array elements in a nanosecond remained more or less unchanged. So, here is a situation where the thread is far behind the quad-core machine, while the workflow itself is not behind, and the GC does not occur.

Studying further, I looked at what the Hotspot compiler does, and found the following via -XX: + PrintCompilation:

 2632 2% com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ 14 (65 bytes) 6709 2% made not entrant com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ -2 (65 bytes) 

The delay between the two lines printed was about 2.5 seconds. It is correct when a method containing large loops has its own optimized code marked as not being an applicant.

I realized that Hotspot works in the background thread, and when it is ready to be exchanged in the new version of the code, it expects that this code is already running to reach a safe point, and then it is replaced. the case of a large for loop that is at the end of each body of the loop (which may have been deployed by some). I would not have expected a delay of 2.5 s if this exchange were not to execute a stop event in the world by JVM. Does it do this when de-optimizing the previous compiled code?

So, my first question is for internal JVM experts: am I on the right track here? Could the 2.5 s delay be related to marking the method as “done by a non-participant”; and if so, why does it have such a strong effect on other flows? If this is unlikely to be the reason, then any ideas about what else needs to be explored will be great.

(for completeness, here is the code I use to measure jitter)

 private static class MeasureJitter extends Thread { private AtomicLong maxJitterWitnessedNS = new AtomicLong(0); public MeasureJitter() { setDaemon( true ); } public void reset() { maxJitterWitnessedNS.set( 0 ); } public double getMaxJitterMillis() { return maxJitterWitnessedNS.get()/1000000.0; } public void printMaxJitterMillis() { System.out.println( "getMaxJitterMillis() = " + getMaxJitterMillis() ); } @Override public void run() { super.run(); long preSleepNS = System.nanoTime(); while( true ) { try { Thread.sleep( 1 ); } catch (InterruptedException e) { e.printStackTrace(); } long wakeupNS = System.nanoTime(); long jitterNS = Math.max(0, wakeupNS - (preSleepNS+1000000)); long max = Math.max( maxJitterWitnessedNS.get(), jitterNS ); maxJitterWitnessedNS.lazySet( max ); preSleepNS = wakeupNS; } } } 
+4
source share
2 answers

It took some digging to find a smoking gun, but the lessons were valuable; especially on how to prove and isolate the cause. So I thought it was good to document them here.

The JVM really expected to come out with the Stop The World event. Alexey Ragozin has a very good blog post on this topic at http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html , and this was a message that led me on the right track. He points out that the secure points are at the borders of the JNI method and calls to the Java method. Thus, the for loop that I have here does not contain safe points.

To understand how to stop world events in Java, use the following JVM flags: -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1

The first prints the total duration of the stoppage of a world event, and this is not limited to GC. In my case, it is printed here:

 Total time for which application threads were stopped: 2.5880809 seconds 

Which proved that I had a problem with threads waiting to reach a safe point. The following two arguments print out why the JVM wanted to wait until it reached the global safe point.

  vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 4.144: EnableBiasedLocking [ 10 1 1 ] [ 2678 0 2678 0 0 ] 0 Total time for which application threads were stopped: 2.6788891 seconds 

So, this suggests that the JVM was waiting for 2678 ms, trying to enable the lock bias function. Why is this a stop to a world event? Fortunately, Martin Thompson also encountered this problem in the past, and he registered it here . It turns out that the Oracle JVM has quite a few thread conflicts at startup, while shifted locking is very expensive, so it delays the inclusion of optimization for four seconds. So, what happened here is that my micro-test took more than four seconds and then there were no safe points in its loop. Therefore, when the JVM tried to enable Based Locking, he had to wait.

The solutions for the candidates that all worked for me were as follows:

  • -XX: -UseBiasedLocking (turn off bias)
  • -XX: BiasedLockingStartupDelay = 0 (immediately enable offset lock)
  • Modify the loop to have a safe point inside it (for example, calling a method that is not optimized or inline)
+5
source

There are many causes of jitter

  • sleep is very unreliable in milliseconds.
  • context switches
  • interrupts
  • cache flaws due to other running programs

Even if you are busy, wait, bind the thread like this to a processor that has been isolated, for example. with isocpus, and move all interrupts that you can disconnect from this processor, you can still see a small amount of jitter. All you can do is reduce it.

BTW: jHiccup does exactly what you do to measure the jitter of your system.

+3
source

All Articles