Multiple instances of the program on a multi-core machine

Question

Multiple instances of the program on a multi-core machine

I assume a dual-core (2 cores per processor) machine with two processors for the following questions; therefore a total of 4 "cores". Therefore, some natural questions arose:

Suppose I wrote a simple sequential program and built it, say, in Visual Studio .. and ran the same program twice, say, with separate input in each run. Will they work on one processor? Or individual processors? How much RAM will be assigned to each? Will there be RAM memory for 1 processor (2 cores) or shared RAM? I believe that two programs will run on different processors, and each of them should have 1 processor RAM (2 cores); but I'm not 100% sure. Was the behavior different in Linux?
Now suppose my program was written using a parallel distributed memory interface such as MPI, and that I ran it once with two processors in the np argument (say). Will the program use both processors (and virtually all 4 cores)? Is this the optimal value for the -np argument? In other words, if I did the same with -np 3 or -np 4; is it right to assume that there will be no additional advantage? Again, I think so, but I'm not 100% sure. I also assume that I can go above 4 (-np 5, -np 6, etc.). In such cases, how do processes compete for memory at np> 4? Will the performance be worse for np> 4. I think so, and maybe this partly depends on the size of the problem, but again not 100% sure.
Then, suppose I run two instances of my parallel program with MPI support, both with -np 2 and with different inputs. First, is this possible? I assume that it is, and that each of them works on both processors? How are two programs synchronized and how do they individually compete for memory sequentially? Should this be at least partially based on the order in which the programs are launched, presumably?
Finally, suppose my program was written using a parallel shared memory interface such as OpenMP, and I ran it once. How many "threads" can I run to fully utilize the shared memory of parallelism - is it 2 or 4? (since I have 2 processors with 2 cores each). I guess it's 4; since all 4 cores are part of a single block of shared memory? It's right? If the answer is 4; Does it make sense to work on more than 4 threads? I'm not sure if this even works (unlike MPI, where I believe that we can do -np 5, -np 6, etc.).

Finally, suppose I run 2 instances of a parallel program with shared memory, each of which, for example, has a different input. I assume that this is possible and that individual processes are somehow competing for memory, apparently in the order in which programs are launched?

+4

c ++ linux visual-studio-2008 openmp mpi

squashed.bugaboo Aug 08 '11 at 23:00

source share

3 answers

@Marcelo is absolutely right, and I would like to expand his answer a bit.

The OS will determine where and when the threads enable application execution, depending on what else is going on in the system and the resources available. Each application will run in its own process, and this process may contain crystals or thousands of threads . The OS (Windows, Linux, Mac any) will switch the execution context of the processing cores to ensure that all applications and services receive a piece of cake.

As for I / O access to such things as RAM, which is physically controlled by the NorthBridge Controls controller located on your motherboard. Each process (and not the processor!) Will have a allocated amount of RAM that it can handle, which can expand or shrink over the entire life of the application ... this, of course, is limited by the number of resources available in the system, and it’s also worth noting that the OS will take care of replacing RAM requests that go beyond its physical availability to disk (that is, in virtual RAM). On the other hand, although you will need to coordinate memory access in your application using critical sections and other thread synchronization mechanisms.

OpenMP is a library that helps you write multi-threaded parellel applications and simplifies the syntax of thread synchronization ... I would comment more, but quite a while has passed since I used it, and I'm sure someone can give a better explanation.

+2

Paul carroll Aug 08 '11 at 23:29

source share

I see that you are using windows, so I will summarize by saying that you can set the affinity of the processes (in which the kernel or kernel the process can run) in the task manager. There is also a winapi call, but the name eludes me

a) for a single-threaded program, they will not run on the same processor (provided that it is associated with Cpu). You can guarantee this by changing the affinity. on linux there is a sched_setaffinity call and a sched_setaffinity user space taskset

b) depends on the MPI library; equipment is a library.

c) it depends on the specific application and data template. For small data access, but a lot of messaging, you can find a limit on 1 CPU as the most efficient template.

+2

Foo bah Aug 08 '11 at 23:31

source share

Marcelo cantos · Accepted Answer · 2011-08-08T23:04:00+0000

Which processor on which they work depends entirely on the OS and depends on many factors, including what else happens on the same machine. However, the general case is that they will sit on one core each, sometimes changing to different kernels ("sometimes" can mean several times per second or even more often).

Çores do not have their own RAM on conventional PC hardware, and the processes will be provided with as much RAM as they ask for.

For MPI processes, yes, your parallelism should match the kernel count (assuming the CPU workload is greater). If two MPI processes are running with -np 2, they will simply consume all four cores. Increase everything and they will begin to fight. As explained above, RAM has nothing to do with this, although the cache will suffer if there is competition.

This "question" is too long, so I will stop now.

Multiple instances of the program on a multi-core machine

More articles: