Glad to see you are using jHiccup, and that it seems to show reality-based hiccups.
jHiccup notes "hiccups" that will also be visible by application threads running on the JVM. This does not pick up a reason - it simply reports this fact. The reasons may be anything that may cause the process to not run code that is completely ready to run: GC pauses are a common cause, but a temporary ^ Z on the keyboard or one of these βliveβ movements between virtualized hosts will also cause a lot of reasons. There are many possible reasons, including planning pressure at the OS or hypervisor level (if one exists), the frenzy of energy management, sharing, and many others. I saw pressure on the Linux file system and the Transparent Huge Face defragmentation, causing multi-second hiccups, as well as ...
A good first step when disabling the pause reason is to use the -c option in jHiccup: it starts a separate control process (with the rest of the inactivity load). If both your application and the management process show hiccups that are roughly correlated in size and time, you will know that you are looking for a system level (as opposed to a local process). If they do not correlate, you will realize that you suspect the insides of your JVM, which most likely indicates that your JVM has stopped at something big; either GC or something else, for example, unlocking a lock or deoptimizing with class loading, which can take a very long [and often not logged] time on some JVMs, if for some reason the time to a safe time is longer ( and on most JVMs, there are many possible reasons for a long time-to-safepoint).
The JHiccup measurement is so messy it's hard to make a mistake. All this makes up less than 650 lines of Java code, so you can take a look at the logic yourself. jHiccup HiccupRecorder thread repeatedly transfers to sleep for 1 ms, and when it wakes up, it records any time difference (from sleep) that is more than 1 ms as hiccups. A simple assumption is that if one thread ready to start (HiccupRecorder) did not start within 5 seconds, other threads in the same process also saw hiccups of the same size.
As you noted above, the observations of jHiccups seem to be confirmed in your independent network logs, where you saw a response time of 5 seconds. Please note that not all hiccups were observed in network logs, since only requests actually made during Hiccups would be detected by the network logger. In contrast, no hiccups larger than ~ 1 ms can hide from jHiccup, as it will try to wake up 1,000 times per second even without any other activity.
It may not be a GC, but before you exclude the GC, I would advise you to look into the GC log a bit more. To begin with, the JVM hint to limit pauses to 200 ms is useless for all known JVMs. A pendant is the equivalent of the word "please." Also, do not believe your GC logs unless you have included -XX: + PrintGCApplicationStoppedTime in the options (and wait for them even then). There are pauses and parts of pauses that can be very long and not communicated unless you enable this flag. For example. I saw pauses caused by a random long counted loop taking 15 seconds to reach a safe point where the GC reported only a fraction of a second .08 second pause, where it really did some work. There are also many pauses, the reasons for which are not considered part of the "GC" and thus may not be reported by the GC logging flags.
- Gil. [jHiccup author]