Java thread pool: is it better to have many smaller tasks or fewer tasks with large batches

We are currently trying to tune performance using multithreading in our java application. We have a long serial task, which we would like to divide into multiprocessor cores.

Basically, we have a list that allows you to say 100,000 items / things that need to be done.

Now my question is: is it better to do this:

Option 1 (pseudo-code):

for(i = 0; i < 100000; i++){ threadpool.submit(new MyCallable("1 thing to do")) } 

This would add 100,000 runnables / callables to the threadpool queue (current LinkedBlockingQueue)

or better to do: Option 2 (pseudocode)

 for(i = 0; i < 4; i++){ threadpool.submit(new MyCallable("25000 things to do")) } 

We already tried option 1, and we did not notice any performance improvement, although we can clearly see that several threads work like crazy, as well as 4 processor cores. But I feel that option 1 has some overhead due to a lot of tasks. We have not tried option 2 yet, but I feel that it can speed things up because the overhead is less. We basically break the list into 4 large pieces instead of 100,000 individual items.

Any thoughts on this?

thanks

+4
source share
5 answers

The important thing is that you minimize the amount of context switching and maximize the amount of work per task spent on computing. In practical terms, if your tasks are calculated, exceeding the number of physical processors will not help. If your tasks actually do a lot of I / O and I / O, you want a lot of them, so there are always a bunch of "ready" tasks when you block.

If you really have 25,000 things, and all this calculation, I would probably configure 32 threads (more processors than you, but not a lot of additional overhead), and lay out 10-50 units of work for each of them, if these units are relatively small.

+3
source

Your analysis is correct: when dispensing elements, there will be less cost (memory, context switching and the total number of commands) - at least, generally speaking.

This is becoming less and less relevant as individual tasks become larger, although if you have already spent 99 percent of your time doing the work and not the overhead of threadpool or creating objects, you can only optimize 1 percent. / P>

+3
source

Well, that depends on your use case.

Performance is wise, I think big chunks of work are better than fewer threads. Context switching will be less, and thus you can save processor cycles and RAM.

When the number of tasks is less, this may not matter much, but yes, if you have 10,000 threads, that matters.

+1
source

Your machine has N cores. You want to use all of your cores, but with minimal overhead. Thus, the minimum number of tasks is likely to be N if the tasks are equal in size. If they are not equal, if the M * N tasks can be better, as this may mean that all the cores are equally busy, although some tasks are relatively short. for example, one core performs one long task, and the other does three short ones. I use M from 2-4 for most of my use cases.

If you can, you can sort the longer running tasks that need to be run first to get a better balance. that is, sort the tasks from the longest to the shortest before adding them.

eg. if you have 8 cores, you may find that 8 tasks are optimal for processing processor bindings. For processing with IO binding or tasks that have different amplitude of time, tasks 2 * 8-4 * 8 can be optimal.

+1
source

The problem with 4 batches may be that if one of them ends in 10 minutes, and three of them in 20 minutes, then 1 core will not be used for 10 minutes, while 3 other threads will process objects on 3 cores. But you are right in the overhead. But the only way to check is to check it, because a lot depends on your data.

+1
source

Source: https://habr.com/ru/post/1415082/


All Articles