Oracle updates / inserts stuck, DB processor 100%, concurrency high, SQL * Pure client wait message

We have a JavaEE application running on Weblogic with Oracle 11g DB using a thin JDBC driver. We recently had a series of production incidents where updates and inserts to a specific table got stuck or took much longer than usual, for no apparent reason. This made the application use more and more database connections (usually in standby mode in the connection pool), the database processor and concurrency took off (as seen on OEM) and the entire database database stopped. During these incidents, database administrators could not find reasons to insert and update (without db locks). What they saw was a lot of < SQL> Net wait events from clients .

Their theory is that the application (jdbc client) somehow got stuck during insert / update instructions for a reason not related to the database, without confirming the database response to these operators. And the fact that the application continued to issue more and more of these applications, linking more and more connections, was the reason that the CPU and concurrency took off, making the database immune.

I am not convinced - if all the sessions were busy waiting for clients, why was the CPU so high? We could not consistently reproduce these incidents, so that we really were in the dark here ...

Has anyone seen anything like this or had any ideas or suggestions that this could be triggered?

thanks

+2
source share
2 answers

What you are describing is a "connection storm." A poorly configured connection pool will handle slow-response connections, opening up new connections to service wait requests. These additional requests impose additional load on the server that has already been stressed (unless it has been emphasized that the initial connections will not lag). This initiates a bad response loop that spawns additional connections that ultimately kill the server.

You can avoid the storm of the connection by setting the maximum capacity of the data source to something reasonable. The definition of β€œreasonable” will vary depending on the capabilities of your servers, but it is probably lower than you think. The best advice is to set the maximum capacity to the same value as the initial capacity.

Once you prevent the storm from connecting, you can focus on the database process (s) that cause the initial slowdown.


A large number of SQL*Net wait message from client events indicates that the client is doing something without accessing the database. This is why your database administrators think the problem is with the application.

+1
source

I ran into a similar problem, which I documented here: An unexpected Oracle session, waiting for the message "SQL * Net from client" event . In my case, the problem was caused by a CLOB binding variable that was bound to the place where the CLOB seems to be causing serious problems in Oracle. The following expression causes the same behavior as you observed:

 CREATE TABLE t ( v INT, s VARCHAR2(400 CHAR) ); var v_s varchar2(50) exec :v_s := 'abc' MERGE INTO t USING ( SELECT 1 v, CAST(:v_s AS CLOB) s FROM DUAL ) s ON (ts = ss) -- Using a CLOB here causes the bug. WHEN MATCHED THEN UPDATE SET tv = sv WHEN NOT MATCHED THEN INSERT (v, s) VALUES (sv, ss); 

There are probably other cases with operators other than MERGE , which also trigger this behavior by creating zombie sessions, since Oracle seems to start some infinite loop creating an observable processor load.

+1
source

All Articles