When a process needs to extract data from a disk, it actually stops working on the CPU to allow other processes to execute, because the operation can take a lot of time - the usual time to search for a disk is at least 5 ms, and 5 ms - 10 million. CPU cycles, eternity with program point of view!
From the point of view of the programmer (also called "in user space"), this is called a system call lock . If you call write(2) (which is the thin shell of libc for a system call with the same name), your process does not exactly stop at that boundary; in the kernel, it continues to execute the system call code. Most of the time it goes all the way to a certain driver of the disk controller (file name β file system / VFS β block device β device driver), where the command to extract the block on the disk is transferred to the corresponding equipment, which is very fast work most of the time.
Then the process is put into a state of sleep (in the core space, a lock is called sleeping - nothing is ever "blocked" from the point of view of the kernel). It will be awakened after the equipment has finally retrieved the necessary data, then the process will be marked as operational and scheduled. In the end, the scheduler will start the process.
Finally, in user space, the lock system call returns with the proper state and data, and the program flow continues.
Most system I / O calls can be called in non-blocking mode (see O_NONBLOCK in open(2) and fcntl(2) ). In this case, system calls are returned immediately and only report the sending of the operation to disk. The programmer will later need to explicitly check whether the operation was successful, successful or not, and retrieve its result (for example, using select(2) ). This is called asynchronous or event-based programming.
Most answers that mention state D (which is called TASK_UNINTERRUPTIBLE in Linux state names) are incorrect. State D is a special standby mode that runs only in the kernel space code path, when this code path cannot be interrupted (since it is too complicated for programming), expecting it to be blocked only for a very long time. a short time. I believe most D-states are actually invisible; they are very short-lived and cannot be detected by tools like "top".
You may encounter unkillable processes in state D in several situations. NFS is famous for this, and I have come across this many times. I think there is some semantic collision between some VFS code paths that suggest that they always reach local disks and quickly detect errors (on SATA, the error wait time is about a few 100 ms), and NFS, which actually selects data from the network, which is more resilient and has slow recovery (normal TCP latency is 300 seconds). Read this article for the TASK_KILLABLE solution introduced in Linux 2.6.25 with the TASK_KILLABLE state. There was a hack before this era when you could send signals to clients of the NFS process by sending SIGKILL to the rpciod kernel rpciod , but forget about this ugly trick ...