Imitation of a process stuck in a call to a blocking system

I am trying to test behavior that is difficult to reproduce in a controlled environment.

Use case: Linux system; usually Redhat EL 5 or 6 (we are just starting with RHEL 7 and systemd, so this is currently out of scope).

There are situations when I need to restart the service. The script we use to stop the service usually works pretty well; it sends SIGTERM to a process that is designed to process it; if the process does not process SIGTERM for a timeout (usually after a couple of minutes), the script sends SIGKILL and then waits another couple of minutes.

Problem: in some (rare) situations, the process does not exit after SIGKILL; this usually happens when it gets stuck badly in a system call, possibly due to a kernel-level problem (a damaged file system or a broken NFS file system or something equally bad, requiring manual intervention).

The error occurred when the script did not understand that the "old" process did not actually exit and started a new process while the old one was still working; we fix this with a stronger locking system (so at least the new process does not start if the old one works), but I find it difficult to check all this because I don’t know, t found a way to simulate a hard stuck process.

So the question is:

How can I manually simulate a process that does not exit when sending SIGKILL to it, even as a privileged user?

+4
source share
6 answers

If your process is stuck in the I / O process, you can simulate your situation as follows:

lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &

thus, the dd process will be stuck waiting for I / O and will ignore every signal, I know that signals are not ignored in the latest kernel processes. Processors are waiting for I / O on the nfs file system.

+6
source

Well ... How about just not sending SIGKILL? This way your env will behave as if it was sent, but the process has not stopped.

+1

"D" ( TASK_UNINTERRUPTIBLE) , , , - ,

- , , , NIC , -. , , .

, , syslog sar , D-. , kernel.bugzilla.org Linux.

+1

. pid , . /var/run/yourserver.pid ( ). script , , . kill 0

yourserver_pid=$(cat /var/run/yourserver.pid)
if [ -f /proc/$yourserver_pid/exe ]; then

readlink /proc/$yourserver_pid/exe /usr/bin/yourserver

BTW, , , SIGKILL ( , , - D, NFS-) , , syslog (, logger script).

SIGTERM, , SIGQUIT, , , SIGKILL ,

0

, script , "" , ;

/, script. , OS , SIGKILL. , - script . ?

0

gdb , SIGKILL , , .

void@tahr:~$ ping 8.8.8.8 > /tmp/ping.log &
[1] 3770
void@tahr:~$ ps 3770
PID TTY      STAT   TIME COMMAND
3770 pts/13   S      0:00 ping 8.8.8.8

void@tahr:~$ sudo gdb -p 3770
...
(gdb)

void@tahr:~$ ps 3770 
PID TTY      STAT   TIME COMMAND
3770 pts/13   t      0:00 ping 8.8.8.8

sudo kill -9 3770
...
void@tahr:~$ ps 3770
PID TTY      STAT   TIME COMMAND
3770 pts/13   Z      0:00 [ping] <defunct>

(gdb) quit
0

All Articles