Update: I found that my program remains responsive if I set the ThreadPoolExecutor's pool size to the maximum pool size (29 threads). However, if I set the size of the main pool to 11 and the maximum pool size to 29, then the actors system will only create 11 threads. How can I configure ActorSystem / ThreadPoolExecutor to continue to create threads to exceed the number of kernel threads and stay within the maximum number of threads? I would prefer not to set the count of the main thread to the maximum number of threads, since I only need additional threads to cancel the job (which should be a rare event).
I have a batch program that works with an Oracle database, implemented using Java / Akka with typical actors with the following members:
BatchManager is responsible for managing the REST controller. It manages a Queue uninitialized batch jobs; when an uninitialized batch job is queued, it is converted to a JobManager actor and executed.JobManager supports the stored procedure queue and the Workers pool; it initializes each Worker stored procedure, and when a Worker finishes, it sends the result of the procedure to the JobManager , and the JobManager sends another stored procedure to the Worker . The batch ends when the job queue is empty and all Workers inactive, after which the JobManager reports its results to the BatchManager , terminates the work of its employees (via TypedActor.context().stop() ), and then disconnects. JobManager has a Promise<Status> completion that terminates when the job completes, or when the job terminates due to a cancellation or fatal exception.Worker executes the stored procedure. It creates an OracleConnection and CallableStatement is used to execute stored procedures and registers the onFailure with JobManager.completion on the abort connection and cancel statement. This callback does not use the actorβs system execution context; instead, it uses the execution context created from the cached executor service created in BatchManager .
My configuration
{"akka" : { "actor" : { "default-dispatcher" : { "type" : "Dispatcher", "executor" : "default-executor", "throughput" : "1", "default-executor" : { "fallback" : "thread-pool-executor" } "thread-pool-executor" : { "keep-alive-time" : "60s", "core-pool-size-min" : coreActorCount, "core-pool-size-max" : coreActorCount, "max-pool-size-min" : maxActorCount, "max-pool-size-max" : maxActorCount, "task-queue-size" : "-1", "task-queue-type" : "linked", "allow-core-timeout" : "on" }}}}}
The number of workers is set up elsewhere, currently workerCount = 8 ; coreActorCount workerCount + 3 , and maxActorCount is workerCount * 3 + 5 . I am testing this on a Macbook Pro 10 with two cores and 8 GB of memory; production server is much larger. The database I'm talking about is behind an extremely slow VPN. I use all this using Oracle JavaSE 1.8 JVM. The local server is Tomcat 7. Oracle JDBC drivers are version 10.2 (I could convince you to use a newer version). All methods return void or Future<> and must be non-blocking.
When one batch is completed successfully, there is no problem - the next batch begins immediately with a full set of workers. However, if I end the current batch through JobManager#completion.tryFailure(new CancellationException("Batch cancelled")) , then the onFailure registered with Workers will go out, and then the system will stop responding. Debug printlns show that a new batch starts with three out of eight working workers, and the BatchManager becomes completely unresponsive (I added the Future<String> ping command, which simply returns Futures.successful("ping") , and that also disconnects). onFailure are performed in a separate thread pool, and even if they were in the actor system thread pool, I must have a sufficiently high max-pool-size to accommodate the original JobManager , its Workers , its onFailure callbacks, and the second JobManager and Workers . Instead, I seem to be hosting the original JobManager and its Workers , the new JobManager and less than half of its Workers , and there is nothing left for the BatchManager. There are not enough resources on the computer on which I run this, but it seems that it should be able to run more than a dozen threads.
Is this a configuration problem? Is this related to the JVM limit and / or limit set by Tomcat? Is it due to a problem with how I handle I / O lock? Perhaps there are a few other things that I could do wrong, that is what came to mind.
Gist of CancellableStatement where CallableStatement and OracleConnection canceled
Invincible is created where CancellableStatements created
Contains JobManager cleanup code
Config dump obtained using System.out.println(mergedConfig.toString());
Edit: I believe that I narrowed down the problem for the actors system (either its configuration, or its interaction with blocking database calls). I removed the Worker members and transferred their workload to Runnables , which run at a fixed ThreadPoolExecutor size, where each JobManager creates its own ThreadPoolExecutor and closes it when the package completes ( shutDown on normal termination, shutDownNow on exceptional termination). Revocation is performed in the cached thread pool created in BatchManager . The actor system manager still remains ThreadPoolExecutor , but only has half a dozen threads. Using this alternative configuration, the cancellation is performed as expected β workers are interrupted when their connections to the database are interrupted, and the new JobManager is executed immediately with a full set of workflows. This indicates that this is not a problem with the / JVM / Tomcat hardware.
Update: I dumped the stream using Eclipse Memory Analyzer . I found that the cancel threads hung on CallableStatement.close() , so I reordered the cancel so that OracleConnection.abort() precedes CallableStatement.cancel() , and this fixes the problem - canceling all (apparently) was done correctly. Worker threads continued to hang on their applications, though - I suspect my VPN may be partially or completely to blame for this.
PerformanceAsync-akka.actor.default-dispatcher-19 at java.net.SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I (Native Method) at java.net.SocketInputStream.read([BIII)I (SocketInputStream.java:150) at java.net.SocketInputStream.read([BII)I (SocketInputStream.java:121) at oracle.net.ns.Packet.receive()V (Unknown Source) at oracle.net.ns.DataPacket.receive()V (Unknown Source) at oracle.net.ns.NetInputStream.getNextPacket()V (Unknown Source) at oracle.net.ns.NetInputStream.read([BII)I (Unknown Source) at oracle.net.ns.NetInputStream.read([B)I (Unknown Source) at oracle.net.ns.NetInputStream.read()I (Unknown Source) at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1()S (T4CMAREngine.java:1109) at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1()B (T4CMAREngine.java:1080) at oracle.jdbc.driver.T4C8Oall.receive()V (T4C8Oall.java:485) at oracle.jdbc.driver.T4CCallableStatement.doOall8(ZZZZ)V (T4CCallableStatement.java:218) at oracle.jdbc.driver.T4CCallableStatement.executeForRows(Z)V (T4CCallableStatement.java:971) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout()V (OracleStatement.java:1192) at oracle.jdbc.driver.OraclePreparedStatement.executeInternal()I (OraclePreparedStatement.java:3415) at oracle.jdbc.driver.OraclePreparedStatement.execute()Z (OraclePreparedStatement.java:3521) at oracle.jdbc.driver.OracleCallableStatement.execute()Z (OracleCallableStatement.java:4612) at com.util.CPProcExecutor.execute(Loracle/jdbc/OracleConnection;Ljava/sql/CallableStatement;Lcom/controller/BaseJobRequest;)V (CPProcExecutor.java:57)
However, even after correcting the cancellation order, I still have a problem when the actors system does not create enough threads: I still get only three of the eight workers in the new batch, while new workers are added as the canceled workers have a network connection timeout . In total, I have 11 threads - my main pool size, out of 29 threads - my maximum pool size. Apparently, the actor system is ignoring my maximum pool size setting, or I'm not setting the maximum pool size correctly.