Here is an example implementation by Netty authors ( link ). They mainly track memory usage and are directly throttled based on this statistics.
Another, rougher way to do this is to limit concurrent execution using a fixed thread pool and a limited queue. The usual way is to allow the caller queue.put() to complete the task on their own as soon as this queue is full. Thus, the load will (well, as expected) be distributed on the way back to the client until the creation of new requests becomes slower. Consequently, the behavior of the application. becomes more "elegant."
In practice, I almost use only the “crude” method described above. It works very well. Basically, a combination of a fixed thread pool and a limited queue + Caller triggers a reject policy. I keep the parameters (queue size, thread pool size) customizable, and then, after the design is complete, I will configure these parameters. Sometimes it becomes obvious that the thread pool can be divided between the service, etc., Therefore, in this case it is very convenient to use the ThreadPoolExecutor class to get the policy of running pools / restricted queues / calling threads, all wrapped in one.
source share