This is a long server application that creates a lot of short-lived garbage and almost does not take root after launch. In most cases, a collection of young people goes quickly, even with 10 GB fast, since almost any garbage, but we observe random brutal emissions. As a potential hint, we run a similarly configured server with a lower load on a machine with more CPU resources, but 12 GB less memory. We do not see this template there.
Here is a snippet from the GC magazines. Is it possible to say that he spends a full day actually performing the work of the GC (instead of waiting until the threads reach safepoints, for example), since the sys time is also large?
2013-12-13T14: 04: 50.388 + 0000: 56748.603: [GC [PSYoungGen: 10363808K-> 1248K (10421824K)] 10569323K-> 207003K (20907584K), 0,0035110 secs] [Times: user = 0.03 sys = 0, 00, real = 0.00 secs]
2013-12-13T14: 17: 32.245 + 0000: 57510.460: [GC [PSYoungGen: 10363808K-> 1600K (10435072K)] 10569563K-> 207563K (20920832K), 0,0038460 secs] [Times: user = 0.04 sys = 0.00, real = 0.01 sec]
2013-12-13T14: 29: 08.572 + 0000: 58206.787: [GC [PSYoungGen: 10381824K-> 1408K (10430912K)] 10587787K-> 207875K (20916672K), 0.0045710 sec] [Time: user = 0.04 sys = 0.00, real = 0.01 sec]
2013-12-13T14: 36: 25.176 + 0000: 58643.391: [GC [PSYoungGen: 10381632K-> 1600K (10442432K)] 10588099K-> 208371K (20928192K), 0,0040800 secs] [Times: user = 0.03 sys = 0, 00, real = 0.00 secs]
2013-12-13T14: 44: 04.409 + 0000: 59102.624: [GC [PSYoungGen: 10397120K-> 1600K (10438848K)] 10603891K-> 208715K (20924608K), 14.7387710 secs] [Times: user = 0.00 sys = 143.05, real = 14.74 p.]
2013-12-13T14: 52: 50,287 + 0000: 59628.502: [GC [PSYoungGen: 10397120K-> 11488K (10446720K)] 10604235K-> 218819K (20932480K), 0,0127280 seconds] [Time: user = 0.11 sys = 0.00, real = 0.01 sec]
2013-12-13T14: 59: 34.934 + 0000: 60033.149: [GC [PSYoungGen: 10418208K-> 1536K (10445760K)] 10625539K-> 218931K (20931520K), 211.1968100 secs] [Times: user = 0.00 sys = 2067.08, real = 211.16 sec.]
2013-12-13T15: 12: 04.478 + 0000: 60782.693: [GC [PSYoungGen: 10408256K-> 15136K (10448064K)] 10625651K-> 232723K (20933824K), 0.0147670 secs] [Times: user = 0.00 sys = 0.12, real = 0.01 sec]
2013-12-13T15: 23: 02.123 + 0000: 61440.338: [GC [PSYoungGen: 10426912K-> 12672K (10448768K)] 10644499K-> 232635K (20934528K), 0.0141040 secs] [Times: user = 0.00 sys = 0.12, real = 0.02 sec]
2013-12-13T15: 31: 09,119 + 0000: 61927.334: [GC [PSYoungGen: 10424448K-> 5280K (10452160K)] 10644411K-> 232907K (20937920K), 0.8134440 sec]] [Time: user = 1.22 sys = 6.30, real = 0.82 s]
2013-12-13T15: 39: 25.083 + 0000: 62423.298: [GC [PSYoungGen: 10421024K-> 6208K (10449344K)] 10648651K-> 235547K (20935104K), 0,0087890 secs] [Time: user = 0.06 sys = 0.00, real = 0.01 sec]
2013-12-13T15: 47: 52,781 + 0000: 62930.996: [GC [PSYoungGen: 10421952K-> 7200K (10456960K)] 10651291K-> 239123K (20942720K), 0,0094460 secs] [Time: user = 0.08 sys = 0.00, real = 0.01 sec]
2013-12-13T15: 58: 18,594 + 0000: 63556.809: [GC [PSYoungGen: 10433248K-> 6752K (10454848K)] 10665171K-> 242331K (20940608K), 0,0091570 secs] [Time: user = 0.08 sys = 0.00, real = 0.01 sec]
On the server above are two quad-core X5570s with 48 GB of RAM. HTOP shows 15 GB, the rest are cached by the application using just over 10 GB at any time (low size). It makes me think it's not a swap problem.
The other server I am referring to is the dual 8-core E5-2690 with 32 GB of RAM. Both run jre 1.7.0_25 on Fedora, although the server above is on a very old version