What do I need to know when supporting a Java application with lots of threads?

Background Information

I have a distributed processing application that performs data analysis. It is designed for parallel processing of many data sets updated in real time. Within the framework of the project, the analysis was divided into analytical nodes. Each node takes raw data and processes it to create other data that can then be used by other nodes. About 200 nodes are required to complete our full analysis on a single dataset.

In the current project, each node works with its own thread. Now, most of the time, these streams were asleep. Each time the data is updated, they wake up each, like a waterfall, and then fall asleep again. Currently, the application runs on 40 data sets, each of which requires 200 nodes, using 8000 threads. When no data arrives, there is no load on the server. When data arrives at the busiest time, the server reaches 25% of the CPU. All this is within the design and production parameters of the project.

Now for the next step, we scale 40 data sets to 200. Each set requires 200 nodes, which means a total of 40,000 nodes, which is 40,000 threads. This exceeds our server’s maximum PID, so I asked our server administrators to increase the cap size. They did this and the application works, but they gave me some feedback on the number of threads. I do not deny that the number of threads is unusual, but at this stage of our project is expected and guaranteed.

I am planning small design changes to separate the stream from node. This will allow us to configure a single thread to run multiple nodes and reduce the number of threads. For datasets that are not updated frequently, there will be very little effect of having one thread perform data updates on each node. For datasets that are updated hundreds of times per second, we can configure each node to run on its own thread. In fact, I have no doubt that this design change will be made - it is only a matter of when. At the same time, I would like as much information as possible about the implications of using this design.

Question

What is the cost of working with more than 40,000 threads per machine? How much performance do I lose if the JVM / Linux OS manages this many threads? Remember that all of them are correctly configured for sleep when there is no work. So, I'm just talking about the extra overhead and the problems caused by the sheer amount of threads.

Please note: I know that I can reduce the number of threads, and I know that it is a good idea to change the design. I will do this as soon as I can, but it must be balanced with other considerations of work and design. I ask this question to collect information in order to make the right decision. Your thoughts and comments on this character are greatly appreciated.

+8
java performance optimization multithreading
source share
2 answers

What is the cost of working with more than 40,000 threads per machine? How much performance do I lose if the JVM / Linux OS manages this many threads? Remember that all of them are correctly configured for sleep when there is no work. So, I'm just talking about the extra overhead and the problems caused by the sheer amount of threads.

In the JVM space, each thread needs a thread stack (256kb by default), as well as a Thread object and related objects. By default, the stack stream can be changed using the -Xss option, but I believe 64kb is the lower limit. (40,000 x 256kb - 10Gb ...)

On Linux, each thread also takes an OS thread descriptor, which will help the thread registration context when the thread is not running ... and other things. These descriptors are preallocated, and I believe that they are not unloaded. This is a resource that your administrators need to grow.

These resources are used if the thread is awake or asleep.

Another problem is that you need to be a little careful in synchronizing with wait / notifyAll. If there are many threads on the same object, the notification will be triggered by a burst of activity when each thread wakes up. (But you can avoid this by not having to wait for threads on the same object.)

See the Oracle Java Threading page for more details on the implications of using a huge number of threads.


I feel that 40,000 threads are excessive. The ideal number of threads is proportional to the number of physical processors / cores that you have. Although you will not see performance degradation due to the huge number of threads, you will be linking many resources, and this can have indirect performance problems; e.g. longer GC time, potential unwinding VM.

The best architecture for your application would be to implement a thread pool and farm work queues to work with far fewer active threads.

+9
source share

Now you said that threads will sleep when there is no work. How often will it work? How many units of work are being performed simultaneously? If this number is larger, then the number of processors and operation, as indicated, are mainly based on processors, you will actually see a general degradation in performance.

But let's say that the amount of work done at any given time is the number of processors. If in this case the number one problem that I see is the amount of context switching that will happen. The context switch in Java (usually based) is about 100 instructions. If all your threads turn on (wake up) in a short period of time to complete part of their work, then we say> 4,000,000 additional instructions.

A bit more information about the cost of the context switch, as they are likely to affect your program more than anything. An excerpt from this document explains the cost of checking the local thread cache when switching

When a new thread is turned on, the data it needs is unlikely to be in the local processor cache, so the context switch causes a flurry of cache misses, and thus the threads are slightly slower when they are first Scheduled. This is one of the reasons that schedulers give each executable a stream of a certain minimum time slot even when many other threads are waiting

In addition, you have the added stack space that needs to be allocated, also has a heap for 40,000 stream objects (which are only about 7 megabytes of fine heap for threads).

+2
source share

All Articles