I was micro-benchmarking the following code, and I noticed something interesting that I hope someone can shed some light on. This leads to a situation where it looks like a for loop can continue to work quickly, blocking other threads in the JVM. If this is so, then I would like to understand why, if this is not so, then some understanding of what I may lose will be clear.
To create a situation, let me guide you through the benchmark that I run and its results.
The code is pretty simple, iterating over each element of the array, summing up its contents. Repeat "targetCount" times.
public class UncontendedByteArrayReadBM extends Benchmark { private int arraySize; private byte[] array; public UncontendedByteArrayReadBM( int arraySize ) { super( "array reads" ); this.arraySize = arraySize; } @Override public void setUp() { super.setUp(); array = new byte[arraySize]; } @Override public void tearDown() { array = null; } @Override public BenchmarkResult invoke( int targetCount ) { long sum = 0; for ( int i=0; i<targetCount; i++ ) { for ( int j=0; j<arraySize; j++ ) { sum += array[j]; } } return new BenchmarkResult( ((long)targetCount)*arraySize, "uncontended byte array reads", sum ); } }
My 4-core Intel Sandy Bridged / i7 laptop with 2 cores runs Java 6 (Oracle JVM) on OSX. This code runs several times until
2.626852686364034 invalid byte array reads / ns [totalTestRun = 3806.837ms]
(I selected the repeated runs used to heat the JVM)
This result seems reasonable to me.
Where it got interesting when I started measuring JVM Jitter. To do this, I start the phonon daemon thread, which sleeps for 1 ms, and then finds out how much longer than 1 ms it really slept. And I modified the report to print the maximum jitter for each repeated test run.
2.6109858273078306 invalid byte array reads / ns [maxJitter = 0.411ms totalTestRun = 3829.971ms]
To get an idea of the “normal” jitter for my environment, before starting the actual test runs, I track jitter without any work, and such indicators as, for example, the following (all in ms). Thus, a 0.411 ms jitter is normal and not very interesting.
getMaxJitterMillis() = 0.599 getMaxJitterMillis() = 0.37 getMaxJitterMillis() = 0.352
I included the code for how I measured Jitter at the end of this question.
The interesting part, however, yes, this happens during the "JVM warm-up" and therefore is not "normal", but I would like to understand the following in more detail:
2.4519521584902644 uncontended byte array reads/ns [maxJitter=2561.222ms totalTestRun=4078.383ms]
Please note that jitter is more than 2.5 seconds. Normally I would put this on a GC. However, I ran System.gc () before running the test, AND -XX: + PrintGCDetails does not yet show GC. Infact there is no GC during any of the test runs, since next to it there is no distribution of objects in this test of summing pre-allocated bytes. This also happens every time I run a new test, and therefore I do not suspect that it is interfering with some other process that happens randomly.
My curiosity took off, because when I noticed that when the jitter is very long, the total duration of the work and, indeed, the number of readable array elements in a nanosecond remained more or less unchanged. So, here is a situation where the thread is far behind the quad-core machine, while the workflow itself is not behind, and the GC does not occur.
Studying further, I looked at what the Hotspot compiler does, and found the following via -XX: + PrintCompilation:
2632 2% com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ 14 (65 bytes) 6709 2% made not entrant com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ -2 (65 bytes)
The delay between the two lines printed was about 2.5 seconds. It is correct when a method containing large loops has its own optimized code marked as not being an applicant.
I realized that Hotspot works in the background thread, and when it is ready to be exchanged in the new version of the code, it expects that this code is already running to reach a safe point, and then it is replaced. the case of a large for loop that is at the end of each body of the loop (which may have been deployed by some). I would not have expected a delay of 2.5 s if this exchange were not to execute a stop event in the world by JVM. Does it do this when de-optimizing the previous compiled code?
So, my first question is for internal JVM experts: am I on the right track here? Could the 2.5 s delay be related to marking the method as “done by a non-participant”; and if so, why does it have such a strong effect on other flows? If this is unlikely to be the reason, then any ideas about what else needs to be explored will be great.
(for completeness, here is the code I use to measure jitter)
private static class MeasureJitter extends Thread { private AtomicLong maxJitterWitnessedNS = new AtomicLong(0); public MeasureJitter() { setDaemon( true ); } public void reset() { maxJitterWitnessedNS.set( 0 ); } public double getMaxJitterMillis() { return maxJitterWitnessedNS.get()/1000000.0; } public void printMaxJitterMillis() { System.out.println( "getMaxJitterMillis() = " + getMaxJitterMillis() ); } @Override public void run() { super.run(); long preSleepNS = System.nanoTime(); while( true ) { try { Thread.sleep( 1 ); } catch (InterruptedException e) { e.printStackTrace(); } long wakeupNS = System.nanoTime(); long jitterNS = Math.max(0, wakeupNS - (preSleepNS+1000000)); long max = Math.max( maxJitterWitnessedNS.get(), jitterNS ); maxJitterWitnessedNS.lazySet( max ); preSleepNS = wakeupNS; } } }