Connection Pool Empty Hibernation 4, but Cannot Find Outlaw

Question

Connection Pool Empty Hibernation 4, but Cannot Find Outlaw

I track the SQL database for connections every 5 minutes. Within a few days it will fend off about 5 connections (my downtime), and then I will be 50. Obviously, this is a recursive problem, because I do not understand why I would jump from 5 to 50 for 5 minutes with zero traffic.

I use Hibernate 4 and Tomcat, and I know about a problem in Hibernate that was fixed in 4.3.2, but I'm on 4.3.5

Read more: An empty pool event occurs every day at exactly 7:13:20 PM ... It sounds too automatic. I use Quartz and it works every 1 minute, but I don’t see how they are connected.

My properties:

jmxEnabled = true initialSize = 5 maxActive = 50 minIdle = 5 maxIdle = 25 maxWait = 10000 maxAge = 10 * 60000 timeBetweenEvictionRunsMillis = 5000 minEvictableIdleTimeMillis = 60000 validationQuery = "SELECT 1" validationQueryTimeout = 3 validationInterval = 15000 testOnBorrow = true testWhileIdle = true testOnReturn = false jdbcInterceptors = "ConnectionState" defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED

Environment:

Tomcat 7.0.59
Update java 1.7.0 76
SQL Server 2012

Additional Information: I reduced the frequency of the quartz every 5 minutes. The event did occur when I loaded the page / view into the application. It was at about 7:14 pm. I'm on the verge of dropping to sleep 3.

Update Today I reloaded the application in Tomcat Manager at 18:50, but the event did occur. Stream dump

+7

tomcat sql-server-2012 hibernate connection-pooling transactions

John giotta May 13, '15 at 21:04

source share

4 answers

But such mistakes are funny. Obviously, we cannot tell you the exact code (unless someone digs an error in the libs you mentioned), so let's see how you can debug this. Approximately simple to complex, although the details depend on your environment.

You have very useful information: The problem always arises at the same time. This points to two options: either one of your tasks that you run using Quartz eats connections, or something (possibly external) happens at this time, causing your code to eat connections. Obviously, you should check the configuration of your work and the cron job or job configured inside the database or similar for potential culprits. Please note that they can begin a long time ago and simply come to this critical state later, so work can begin 2 hours earlier than we know.
Check your logs and system logs and database logs for everything that happens at this time or some time earlier.

Double check everything that gets the connection, if it always returns the connection. Especially when exceptions are thrown. One classic way to fail with this is with such a construction (java as pseudo-code):

 Connection con; try { con = getConnection(); Statement = stmnt = con.createStatement(); .... } finally (Exception ex){ if (stmnt != null) stmnt.close(); if (con != null) con.close(); // this will never happen if stmnt.close throws an exceptions }

Set up a registration so that you can see when exactly the connection does not return. Everything that runs anything in your application should go through some kind of shell (AOP around Aspect, Servlet Filter or similar). This shell should do the following: create a unique identifier for the action (UUID) and place it in the MDC of your logging system . At the end of the action, this identifier is deleted again. All other entries must include this identifier. Wrap your connection pool. Keep track of when something has requested a connection, including a timestamp, an identifier, and possibly a stack (by creating and storing an exception). Log in. Each time the connection returns, the time of its use. Also, each time a connection is requested, check to see if any connection has been used longer than a certain threshold.
Isolate things. Configure the second server on which you are running the application. Does it have the same problem? Run some parts on only one of the two servers, do they still have a problem? Continue to exclude candidates until only one remains.

+2

Jens schauder May 16 '15 at 9:41

source share

If I encounter such a problem, I will try to get a stream dump when maxActive connections reach 50. You can try to increase this maximum limit to check if the application has a higher peak.

I would also like to configure tomcat to use a connection pool provider such as c3p0 if it is not already in use. Then I would create my own hook class, as described in the following section: http://www.mchange.com/projects/c3p0/#connection_customizers

With this custom class, keep track of the current connections that will be received and issued. When this number is close or at the limit, initiate a soft reset of the stream. This can be done as described on the following page: http://crunchify.com/how-to-generate-java-thread-dump-programmatically/ Analyze this stream dump to check the source of the connections.

This information will not only be useful for your current problem, but also for troubleshooting future performance issues.

+2

codedabbler May 21 '15 at 19:53

source share

I built a connection pool monitoring tool called FlexyPool, and this can help you figure out the culprit. It also supports TomcatCP, and you can compare its metrics with other logs you use,

connection lease time histogram should tell you how long the connection is holding, which means you may have slow requests.

concurrent connections histogram reports how many connections are used on one, and if you have less than 50, then you have a connection leak problem.

+2

Vlad Mihalcea May 22, '15 at 5:51

source share

John giotta · Accepted Answer · 2015-05-26T00:35:30+0000

I want to thank everyone for your answers. As @JensSchauder suggested I work on trying to isolate the problem. The wonder why I did not have a problem with QA, but did it in production.

Although I continued to work with the Network Operations team, no one attacked her until I finally got the logs I needed.

We use a product called Alert Logic to scan and identify security vulnerabilities, but unfortunately it was not found guilty until I was able to trace the Apache IP access logs. whois identified the IP coming from Alert Logic software from the Rackspace host.

The application server was new and consisted of a new architecture image. It turns out that Alert Logic strikes a vulnerability. This led to the empty connection pool (swap?)

Until the middle of last week, I had no idea that Alert Logic was even in the equation. In fact, I am now working with Network Operations to better monitor the product since it has expired.

Later this week I will post the results of this vulnerability in QA (since product patching was a priority).

Connection Pool Empty Hibernation 4, but Cannot Find Outlaw

More articles: