OutOfMemoryError as a result of multiple searches

I have a classic Java EE system, a web tier with JSF, EJB 3 for BL, and Hibernate 3 that makes access to a DB2 database. I am struggling with the following scenario: the user will start a process that involves retrieving a large set of data from a database. The extraction process takes some time, and therefore the user does not receive an immediate response, gets impatient and opens a new browser and initiates the search again, sometimes several times. The EJB container is obviously unaware that the first searches are no longer relevant, and when the database returns a set of results, Hibernate begins to populate the set of POJOs that take up huge amounts of memory, eventually raising an OutOfMemoryError .

The potential solution I thought was to use the Hibernate Session cancelQuery session method. However, the cancelQuery method cancelQuery works until the database returns a set of results. Once the database returns the result set and Hibernate starts cancelQuery POJO, the cancelQuery method no longer has an effect. In this case, database queries are returned quite quickly, and most of the performance overhead seems to be in the POJO population, after which we can no longer call the cancelQuery method.

0
java java-ee db2 out-of-memory hibernate
source share
3 answers

The implemented solution is as follows:

The general idea was to save a map of all Hibernate sessions that are currently launching requests to the HttpSession of the user who initiated them, so that when the user closes the browser, we can kill the running requests.

There were two main problems. One of them distributed the HTTP session identifier from the web level to the EJB level, without interfering with all method calls along the way, i.e. Without distorting existing code in the system. The second task was to figure out how to cancel queries as soon as the database has already begun to return results, and Hibernate fills the objects with results.

The first problem was solved on the basis of our implementation, that all methods called along the stack are processed by the same thread. This makes sense, since our application exists in one container, and does not have remote calls . Thus, we created a servlet filter that intercepts every application call and adds a ThreadLocal variable with the current HTTP session identifier. This way, the HTTP session identifier will be available for each of the method calls down the line.

The second problem was a bit more sticky. We found that the Hibernate method responsible for executing the queries and then populating the POJO was called by doQuery and located in the org.hibernate.loader.Loader.java class. (We use Hibernate 3.5.3, but the same is true for newer versions of Hibernate.):
 private List doQuery( final SessionImplementor session, final QueryParameters queryParameters, final boolean returnProxies) throws SQLException, HibernateException { final RowSelection selection = queryParameters.getRowSelection(); final int maxRows = hasMaxRows( selection ) ? selection.getMaxRows().intValue() : Integer.MAX_VALUE; final int entitySpan = getEntityPersisters().length; final ArrayList hydratedObjects = entitySpan == 0 ? null : new ArrayList( entitySpan * 10 ); final PreparedStatement st = prepareQueryStatement( queryParameters, false, session ); final ResultSet rs = getResultSet( st, queryParameters.hasAutoDiscoverScalarTypes(), queryParameters.isCallable(), selection, session ); final EntityKey optionalObjectKey = getOptionalObjectKey( queryParameters, session ); final LockMode[] lockModesArray = getLockModes( queryParameters.getLockOptions() ); final boolean createSubselects = isSubselectLoadingEnabled(); final List subselectResultKeys = createSubselects ? new ArrayList() : null; final List results = new ArrayList(); try { handleEmptyCollections( queryParameters.getCollectionKeys(), rs, session ); EntityKey[] keys = new EntityKey[entitySpan]; //we can reuse it for each row if ( log.isTraceEnabled() ) log.trace( "processing result set" ); int count; for ( count = 0; count < maxRows && rs.next(); count++ ) { if ( log.isTraceEnabled() ) log.debug("result set row: " + count); Object result = getRowFromResultSet( rs, session, queryParameters, lockModesArray, optionalObjectKey, hydratedObjects, keys, returnProxies ); results.add( result ); if ( createSubselects ) { subselectResultKeys.add(keys); keys = new EntityKey[entitySpan]; //can't reuse in this case } } if ( log.isTraceEnabled() ) { log.trace( "done processing result set (" + count + " rows)" ); } } finally { session.getBatcher().closeQueryStatement( st, rs ); } initializeEntitiesAndCollections( hydratedObjects, rs, session, queryParameters.isReadOnly( session ) ); if ( createSubselects ) createSubselects( subselectResultKeys, queryParameters, session ); return results; //getResultList(results); } 

In this method, you can see that first the results are output from the database in the form of the old old style java.sql.ResultSet , after which it runs in a loop over each set and creates an object from it. Some additional initialization is done in the initializeEntitiesAndCollections() method, which is called after the loop. After a little debugging, we found that the main part of the performance overhead was in these sections of the method, and not in the part that gets java.sql.ResultSet from the database, but the cancelQuery method was only effective in the first part, so the solution was to add an extra conditions in a for loop to check if the thread is interrupted as follows:

 for ( count = 0; count < maxRows && rs.next() && !currentThread.isInterrupted(); count++ ) { // ... } 

and also perform the same check before calling the initializeEntitiesAndCollections() method:

 if (!Thread.interrupted()) { initializeEntitiesAndCollections(hydratedObjects, rs, session, queryParameters.isReadOnly(session)); if (createSubselects) { createSubselects(subselectResultKeys, queryParameters, session); } } 

In addition, by calling Thread.interrupted() in the second check, the flag is cleared and does not affect the further functioning of the program. Now that the request needs to be canceled, the cancellation method accesses the Hibernate session and the stream stored on the card with the HTTP session identifier as the key calls the cancelQuery method in the session and calls the interrupt thread method.

+1
source share

I had a similar problem in a completely different environment. I did the following: before adding a new task, in turn, I first checked whether "the same task" was already queued from this user. If so, I do not accept the second task and inform the user about it.

This does not answer the question of how to protect the user from outOfMemory if the data is too large to fit in the available RAM. But this is a good trick to protect your server from useless things.

0
source share

Too complicated for me :-) I would like to create a separate service for "heavy" requests. And save in it information about query parameters, possibly results that will be valid for a limited time. If the execution of the request is too long, the user receives a message that the completion of his task will take a considerable time, and he can wait or cancel it. This scenario is great for analytic queries. This option gave you easy access to a task running on the server to kill it.

But if you have a problem with sleep mode, then I assume that the problem is not in analytical queries, but in ordinary business queries. If its execution is too long, can you try using the L2 cache (a cold start can be very long, but hot data will be received instantly)? Or optimize hibernate \ jbdc options?

0
source share

All Articles