Design Considerations for the Adaptive Thread Pool in Java

I would like to implement a thread pool in Java, which can dynamically resize itself based on the computational and I / O tasks presented to it.

In practice, I want to achieve the same behavior as the new thread pool implementation in C # 4.0

Is there an implementation already or can I achieve this behavior using mostly existing concurrency utilities (like CachedThreadPool)?

The C # version performs self-diagnostics to achieve optimal use. What native tools are available in Java and what are the real performance implications?

Is it possible to implement a cooperative approach when a task signals its intention (for example, entering into an intensive I / O operation, entering into an intensive working phase of the processor)?

Any suggestions are welcome.

Edit Based on comments:

Target scenarios may be:

  • Local scanning and file processing
  • Web Crawl
  • Access and aggregate multiple web services

The problem with CachedThreadPool is that it starts new threads when all existing threads are blocked - you need to set explicit boundaries on it, but that is.

For example, I have 100 web services to access per line. If I create 100 CTPs, it will start 100 threads to complete the operation, and a ton of several I / O and data transfer requests will surely stumble onto each other's legs. For a static test case, I could experiment and find the optimal pool size, but I want it to be adaptively defined and applied in this way.

+4
source share
4 answers

Let me introduce an alternative approach. Having one thread pool is a great abstraction, but it is not very efficient, especially when jobs are very attached to IO - then there is no good way to tune it by forcing the pool size to maximize I / O bandwidth, but you suffer from too many switches threads etc.

Instead, I would suggest taking a look at the Apache MINA infrastructure architecture for inspiration. ( http://mina.apache.org/ ) This is a high-performance web environment - they describe it as a server structure, but I think their architecture works well for backlinks as well as scripts such as spidering and multiserver clients. (In fact, you can even use it out of the box for your project.)

They use the Java NIO (non-blocking I / O) libraries for all I / O operations and divide the work into two thread pools: a small and fast set of socket streams and a wider and slower set of business logic streams. Therefore, the layers are as follows:

  • At the end of the network is a large set of NIO channels, each with a message buffer
  • A small socket thread pool passing through the round-robin channel list. Their only task is to check the socket and transfer any data to the message buffer - and if the message is completed, close it and put it in the job queue. These guys are fast because they just push bits around and skip any sockets that are blocked in IO.
  • One job queue that serializes all messages
  • A large pool of processing threads that pull messages from the queue, analyze them and perform any processing.

This provides very good performance - IO stands out on its own layer, and you can configure the socket thread pool to maximize I / O throughput, and separately configure the processing thread pool to control CPU / resource loading.

0
source

Consider creating a map where the key is the bottleneck.

Each thread sent to the pool will provide a resource, which is a bottleneck, that is, "CPU", "Network", "C: \", etc.

You can start by allowing only one thread per resource, and then, possibly, slowly increase until the shutdown speed stops. Things like the CPU can have a gender of the main count.

+2
source

Given example

Result[] a = new Result[N]; for(int i=0;i<N;i++) { a[i] = compute(i); } 

In Java, you can paralyze this on each free core and distribute the workload dynamically, so it doesn’t matter if one task takes more than another.

 // defined earlier int procs = Runtime.getRuntime().availableProcessors(); ExecutorService service = Executors.newFixedThreadPool(proc); // main loop. Future<Result>[] f = new Future<Result>[N]; for(int i = 0; i < N; i++) { final int i2 = i; a[i] = service.submit(new Callable<Result>() { public Result call() { return compute(i2); } } } Result[] a = new Result[N]; for(int i = 0; i < N; i++) a[i] = f[i].get(); 

This hasn't changed much in the last 5 years, so it's not as cool as it was when it was first available. What is really missing in Java is closure. Instead, you can use Groovy if this is really a problem.

Additionally: if you care about performance, and not as an example, you will calculate Fibonacci in parallel, because it is a good example of a function that is faster if you calculate single-threaded.

One difference is that each thread pool has only one queue, so there is no need to steal work. This potentially means that you have more overhead per task. However, if your tasks usually take more than 10 microseconds, it does not matter.

+1
source

I think you should keep track of the processor load, depending on the platform. Find out how many processors / cores you have and track the load. When you find that the load is low and you still have more work, create new threads, but no more than x times num-cpus (say x = 2).

If you really want to consider IO threads as well, try to find out what state each thread is in when your pool is exhausted and subtract all pending threads from the total. One risk is that you run out of memory by allowing too many tasks.

0
source

All Articles