You cannot process more requests than processors. Fast, scalable solutions include creating thread pools, where the number of active (not blocked in IO) threads is the number of CPU cores. Thus, creating 100 threads because you want to serve requests in 100 msmq is not a good design.
Windows has a thread pooling mechanism called IO Completion Ports .
Using IO I / O ports pushes the project to a single process, since in a multiprocessor design each process will have its own IO I / O thread pool, which will be controlled independently, and therefore you can get many more threads fighting for CPU cores .
The "core" of the idea of I / O I / O is that its kernel mode queue - you can manually send events to the queue or receive asynchronous I / O completions sent to it automatically, linking files (files, sockets, channels) with by the port.
On the other hand, the Port IO Completion Port mechanism automatically deactivates events on threads of waiting workers, but it does NOT cancel tasks if it detects that the current "active" threads in the thread pool> = the number of CPU cores.
Using IO I / O ports can potentially increase the scalability of the service, but, nevertheless, the gain is much less than expected, as other factors quickly come into play when all processor cores compete for other services.
If your services are developed in C ++, you may find that serialized heap access is a big minus of performance - although Windows 6.1 seems to have implemented a low competing heap, so this may be less of a problem.
To summarize - theoretically, your biggest performance boost would be design-related, using thread pools controlled in one process. But you are very dependent on the libraries you use so as not to serialize access to critical resources, which can quickly save you from all theoretical performance indicators. If you have library code that serializes your well-piercing service (as in the case of creating a C ++ object and destruction that serializes due to heap conflict), you need to change the use of the library / switch to a low competing version of the library or just scale to several processes.
The only way to find out is to write test cases that affect the server in different ways and measure the results.