Switching between processes (assuming that you are actually switching, rather than running them in parallel) in the order of o-my-god.
The trap from user space to kernel space used to run with processor interruption earlier. Around 2005 (I donβt remember the kernel version), and after a discussion on the mailing list, where someone found that the capture was slower (in absolute measurements!) On a high-performance xeon processor than on a previous Pentium II or III (again, my memory), they implemented it with the new cpu sysenter instruction (which really existed with the Pentium Pro, I think). This is done on the virtual dynamic shared object (vdso) page in each process (cat / proc / pid / maps to find it). IIRC.
So, at present, the kernel trap is just a couple of processor instructions, which means several cycles compared to tens or hundreds of thousands when using an interrupt (which is very slow for modern processors).
Switching context between processes is difficult. This means preserving the entire processor state (registers, etc.) in RAM (actually in the memory area in the user process space, guess where!), In practice, polluting all cached memory in the processor and reading the process state for a new process. It (most likely) will not remain in the processor cache since the last start, so each read memory will be absent in the cache and should be read from RAM. This is pretty slow. When I was at university, I "came up" (well, I came up with this idea, knowing that the processor has a lot of dyes, but not cool enough if it works constantly) cache, which was infinite in size, although it was not supported when not in use (used only for context switches, i.e.) in the CPU, and implemented this in Simics. Implemented support for this magic cache, which I called CARD (Context-Switch Active, Run-time Drowsy) on Linux, and compared quite strongly. I found that this can speed up a Linux machine with a lot of heavy processes sharing a single kernel with about 5%. However, this was related to relatively short (with a small delay) fragments of the process time.
Anyway. The context switch is still pretty heavy, and the kernel trap is mostly free.
Respond to which memory location in user space for each process:
At the zero address. Yes, a null pointer! You still canβt read the whole page from user space :) It was in 2005, but it is probably the same if the processor status information is larger than the page size, in which case they could change the implementation.