How to diagnose endless work of JVM GC?

We have a Java EE application with several gigabyte heap sizes on our production servers. From time to time, any of our servers will no longer respond to any requests.

  • When a problem occurs, the GC log indicates that the server spends a lot of time executing GCs, which take 8 to 10 seconds (usually they take less than 1).
  • We never get OutOfMemoryErrors.
  • The problem does not occur when the heap reaches a certain heap size - in fact, it arises with different heap sizes, none of which are even close to the configured maximum.
  • The problem does not occur at a certain interval, at a certain time, to load the user or to certain server nodes. It seems completely random.
  • Heap dumps, even those that were taken from the server while it was showing the problem, did not display anything that was clearly wrong.
  • Restarting production servers every day, apparently, reduces the likelihood of a problem, but does not fix it.
  • If we do not restart our servers every day, there is a high probability that a problem will occur on one of our 8 production servers within one to three days.

How would you begin to diagnose this?

Configuration

Our JAVA_OPTS are as follows: -Xms8096m -Xmx8096m -XX:MaxPermSize=512M -Dsun.rmi.dgc.client.gcInterval=1800000 -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:NewSize=150M -XX:+UseParNewGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/path/to/gc.log

 $ java -version java version "1.6.0_12" Java(TM) SE Runtime Environment (build 1.6.0_12-b04) Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode) $ uname -a Linux myhostname 2.6.18-274.3.1.el5 #1 SMP Tue Sep 6 20:13:52 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux $ cat /proc/version Linux version 2.6.18-274.3.1.el5 ( mockbuild@builder10.centos.org ) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Sep 6 20:13:52 EDT 2011 $ cat /etc/issue CentOS release 5.7 (Final) Kernel \r on an \m $ cat /proc/meminfo|grep "MemTotal" MemTotal: 16279356 kB 

Gc log

This is an example of a GC log fragment when a problem occurred:

 111036.554: [GC 111036.555: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111036.555: [Tenured: 3629252K->3647971K(5526912K), 8.7565190 secs] 5840068K->3647971K(8014016K), 8.7567840 secs] 111055.691: [GC 111055.691: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111055.691: [Tenured: 3647971K->3667529K(5526912K), 8.7876340 secs] 5858787K->3667529K(8014016K), 8.7878690 secs] 111071.037: [GC 111071.037: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111071.037: [Tenured: 3667529K->3692057K(5526912K), 8.7581830 secs] 5878345K->3692057K(8014016K), 8.7584210 secs] 111088.407: [GC 111088.407: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111088.407: [Tenured: 3692057K->3638194K(5526912K), 10.7072790 secs] 5902873K->3638194K(8014016K), 10.7074960 secs] 111110.238: [GC 111110.238: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111110.238: [Tenured: 3638194K->3654614K(5526912K), 8.8021440 secs] 5849010K->3654614K(8014016K), 8.8023860 secs] 111128.115: [GC 111128.115: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111128.115: [Tenured: 3654614K->3668670K(5526912K), 8.8451510 secs] 5865430K->3668670K(8014016K), 8.8453600 secs] 111161.684: [GC 111161.684: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111161.684: [Tenured: 3668670K->3684080K(5526912K), 8.8156740 secs] 5879486K->3684080K(8014016K), 8.8159260 secs] 111186.669: [GC 111186.669: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111186.669: [Tenured: 3684080K->3639333K(5526912K), 10.6025350 secs] 5894896K->3639333K(8014016K), 10.6030040 secs] 111208.692: [GC 111208.692: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111208.692: [Tenured: 3639333K->3657993K(5526912K), 8.7967920 secs] 5850149K->3657993K(8014016K), 8.7970090 secs] 111235.486: [GC 111235.487: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111235.487: [Tenured: 3657993K->3676521K(5526912K), 8.8212340 secs] 5868809K->3676521K(8014016K), 8.8214930 secs] 
+4
source share
3 answers

Since you have a very old version of Java (almost four years) and you seem to be in an error state, the first thing to try is a newer version, such as updating to Java 6 35. I suspect update 12 doesn’t I have a compressed oops by default, which is an option that should save you some memory (and therefore overhead).

+2
source

First of all: stream dump! Kill the -3 process and check the log outputs.

You can confirm the GC by looking at the GC streams.

You are using 8Go for the JVM. How many concerts for RAM? (Hopefully at least 12).

The flow dump will show you the sizes Eden, From, To, Permgen. This can help find out which memory space is having problems.

+1
source

It usually takes a little experimentation, but I assume you see this problem every 30 minutes, right? When you have a full GC, just like you, with these tightened lines that don't seem to be called by the full heap, they will most likely be called by System.gc (). Usually in the log it will be printed as (System), but since your virtual machine is old, I'm not sure. GC log information changes each version. Therefore, to eliminate this, I highly recommend using "-XX: + DisableExplicitGC". It also ignores the call that the DGC makes.

The second option here may be that you have a memory leak / problem that you did not see on the dump. The lines say that you always have 2210816K in the new, and in the end you still have 3676521K. If all 2210816K are alive (what it looks like), then it is impossible to transfer them to Tenured because it will not match (5887337K), therefore, if it does not stop doing this, you may receive a message with an excess of the premium on GC. But in this case, you must have these 6G live objects on the heap when you dump

0
source

All Articles