The first question is what does the server do? Will cpu or IO be bound during each client request?
If this is a CPU, then it makes no sense to try to process them all in parallel, since you no longer have concurrency than the number of cores on the server. In this case, you can simply create as many threads as there are cores on the server, and process the inputs one at a time as fast as you can.
If the server process is associated with IO, you need to determine how long it will take for each thread that processes the client request, which will give you an idea of โโhow many threads it makes sense to create. What a classic approach, but as others have pointed out, in this case a more modern approach would be to use an asynchronous programming library. For C ++ on Windows, this will be PPL.
UPDATE
You seem very interested in staying low, therefore, to understand the essence of your initial question, how to calculate how many threads the kernel can support.
First, you will create wrapper functions for any blocking calls made by threads (which record the blocking time of each thread). From these indicators you can determine the average filling of the flows, and as soon as you find out that obtaining an approximate calculation of the optimal number of flows is quite simple.
thread_occupancy = (thread_run_time - thread_blocked_time) / thread_run_time optimal_thread_count = num_cores / thread_occupancy
You probably want to add at least 0.1 (10%) to thread_occupancy to cover thread switching threads.
But, as others have said, this classic multi-threaded approach only works for a few dozen threads. As soon as the OS controls the planning of one hundred threads or so, then the overhead of planning increases to such an extent that adding more threads does not bring any benefits. The point at which this happens, although highly dependent on the system and software, so you just need to do some tests to get to the sweet spot.
If your operations are so limited in IO that you want to process hundreds or thousands of requests at the same time, then you have no choice but to process multiple requests in a stream using asynchronous processing, which usually requires the use of an asynchronous library. In this case, you will usually have one thread per core, fully occupied under the control of the asynchronous library. This can be processed by the library itself, or you may need to configure it manually, but in any case you do not need time to control the number of threads, so when you can control the busyness of threads in the same way as a purely multi-threaded approach there would be little for you to do with this information.