Windows, multiple processes and multiple threads

We need to make our system very scalable and it was developed for the Windows platform using VC ++. Say first, we would like to process 100 requests (from msmq) at the same time. What would be a better approach? One process with 100 threads or 2 processes with 50-50 threads? What profit depends on the memory of the process in the case of the second approach. on Windows, the first processor time is allocated to a process and then split between threads for that process, or the OS counts the number of threads for each process and allocates CPU based on threads, not processes. We noticed that in the first case, the processor load is 15-25%, and we want to consume more CPU. Remember that we would like to get optimal performance, so 100 queries, for example, are examples. We also noticed that if we increase the number of process threads above 120, performance will deteriorate due to context switches.

Another point; our product already supports clustering, but we want to use more processor on one node.

Any suggestions would be highly appreciated.

+4
source share
2 answers

The standard approach to windows is multiple threads. Not to mention that this is always your best decision, but there is a price to pay for each thread or process, and in windows the process is more expensive. As for the scheduler, I'm not sure, but you can set the priority of the process and threads. The real benefit for streams is their common address space and the ability to communicate without IPC, however, synchronization must be carefully maintained.

If you have already developed a system that seems to be easier to implement a multi-process solution, especially if it is likely that the latter can be used by more than one machine. Since your IPC from 2 processes on one computer can scale to several computers in general. Most attempts at massive parallelization fail because the whole system is not evaluated for bottle necks. for example, if you implement 100 threads that all write to the same database, you may get little actual performance and just wait for your database.

just my .02

+3
source

You cannot process more requests than processors. Fast, scalable solutions include creating thread pools, where the number of active (not blocked in IO) threads is the number of CPU cores. Thus, creating 100 threads because you want to serve requests in 100 msmq is not a good design.

Windows has a thread pooling mechanism called IO Completion Ports .

Using IO I / O ports pushes the project to a single process, since in a multiprocessor design each process will have its own IO I / O thread pool, which will be controlled independently, and therefore you can get many more threads fighting for CPU cores .

The "core" of the idea of ​​I / O I / O is that its kernel mode queue - you can manually send events to the queue or receive asynchronous I / O completions sent to it automatically, linking files (files, sockets, channels) with by the port.

On the other hand, the Port IO Completion Port mechanism automatically deactivates events on threads of waiting workers, but it does NOT cancel tasks if it detects that the current "active" threads in the thread pool> = the number of CPU cores.

Using IO I / O ports can potentially increase the scalability of the service, but, nevertheless, the gain is much less than expected, as other factors quickly come into play when all processor cores compete for other services.

If your services are developed in C ++, you may find that serialized heap access is a big minus of performance - although Windows 6.1 seems to have implemented a low competing heap, so this may be less of a problem.

To summarize - theoretically, your biggest performance boost would be design-related, using thread pools controlled in one process. But you are very dependent on the libraries you use so as not to serialize access to critical resources, which can quickly save you from all theoretical performance indicators. If you have library code that serializes your well-piercing service (as in the case of creating a C ++ object and destruction that serializes due to heap conflict), you need to change the use of the library / switch to a low competing version of the library or just scale to several processes.

The only way to find out is to write test cases that affect the server in different ways and measure the results.

+3
source

All Articles