Does it take longer? Switching between user and kernel modes or switching between two processes?

What takes more time?

Switching between user and kernel modes (or) switching between two processes?

Please explain the reason.

EDIT: I know that whenever there is a context switch, the dispatcher needs some time to save the status of the previous process on its board, and then restart the next process from its corresponding circuit board. And to switch between user and kernel modes, I know that the mode bit must be changed. Isn't that all, or even more?

+6
source share
1 answer

Switching between processes (assuming that you are actually switching, rather than running them in parallel) in the order of o-my-god.

The trap from user space to kernel space used to run with processor interruption earlier. Around 2005 (I don’t remember the kernel version), and after a discussion on the mailing list, where someone found that the capture was slower (in absolute measurements!) On a high-performance xeon processor than on a previous Pentium II or III (again, my memory), they implemented it with the new cpu sysenter instruction (which really existed with the Pentium Pro, I think). This is done on the virtual dynamic shared object (vdso) page in each process (cat / proc / pid / maps to find it). IIRC.

So, at present, the kernel trap is just a couple of processor instructions, which means several cycles compared to tens or hundreds of thousands when using an interrupt (which is very slow for modern processors).

Switching context between processes is difficult. This means preserving the entire processor state (registers, etc.) in RAM (actually in the memory area in the user process space, guess where!), In practice, polluting all cached memory in the processor and reading the process state for a new process. It (most likely) will not remain in the processor cache since the last start, so each read memory will be absent in the cache and should be read from RAM. This is pretty slow. When I was at university, I "came up" (well, I came up with this idea, knowing that the processor has a lot of dyes, but not cool enough if it works constantly) cache, which was infinite in size, although it was not supported when not in use (used only for context switches, i.e.) in the CPU, and implemented this in Simics. Implemented support for this magic cache, which I called CARD (Context-Switch Active, Run-time Drowsy) on Linux, and compared quite strongly. I found that this can speed up a Linux machine with a lot of heavy processes sharing a single kernel with about 5%. However, this was related to relatively short (with a small delay) fragments of the process time.

Anyway. The context switch is still pretty heavy, and the kernel trap is mostly free.

Respond to which memory location in user space for each process:

At the zero address. Yes, a null pointer! You still can’t read the whole page from user space :) It was in 2005, but it is probably the same if the processor status information is larger than the page size, in which case they could change the implementation.

+5
source

All Articles