How is it possible that kill -9 for a process on Linux has no effect?

I am writing a plugin to automatically highlight text strings when visiting a website. This is similar to search results, but automatically for many words; it can be used for people with allergies so that words really stand out, for example, when they look at a food site.

But I have a problem. When I try to close an empty fresh FF window, it somehow blocks the whole process. When I kill the process, all the windows disappear, but the Firefox process remains alive (the parent PID is 1, it does not listen to any signals, it has many resources open, the processor is still eating, but it will not budge).

So, two questions:

  • How is it possible for a process to not listen on kill -9 (neither as a user nor as root)?

  • Is there anything I can do but reboot?

[EDIT] This is a disruptive process:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND digulla 16688 4.3 4.2 784476 345464 pts/14 D Mar28 75:02 /opt/firefox-3.0/firefox-bin 

Same thing with ps -ef | grep firefox ps -ef | grep firefox

 UID PID PPID C STIME TTY TIME CMD digulla 16688 1 4 Mar28 pts/14 01:15:02 /opt/firefox-3.0/firefox-bin 

This is the only process. As you can see, this is not a zombie, it works! It does not listen to kill -9, regardless of whether I killed the PID or name! If I try to connect to strace , strace also freezes and cannot be killed. Also there is no way out. I assume that FF hangs in some kernel program, but which one?

[EDIT2] Based on sigjuice reviews:

 ps axopid,comm,wchan 

can show you in which kernel program the process hangs. In my case, the intruder plugin was the Beagle index (openSUSE 11.1). After disabling the plugin, FF again became a fast and happy fox.

+64
linux process kill sysadmin
Mar 29 '09 at 14:39
source share
7 answers

As noted in the comments on the OP, the process status ( STAT ) D indicates that the process is in a "no sleep interruption" state. Under real conditions, this usually means that it is waiting for I / O and cannot / will not do anything, including dying, until this I / O operation completes.

Processes in state D will usually only be present for a split second before the operation is completed, and they return to R / S In my experience, if a process is stuck in D , it most often tries to communicate with an inaccessible NFS or other remote file system, trying to access a fail-safe hard drive or use some kind of hardware element in the form of a flaky device driver. In such cases, the only way to recover and allow the process to die is to either return a backup of fs / drive / hardware to do I / O, or refuse and reboot the system. In a specific case, the NFS mount may also eventually disconnect and return from the I / O operation (with a failure code), but this depends on the mount options, and very often, waiting for the NFS mount to be set forever.

This is different from the zombie process, which will have the status Z

+121
Mar 31 '09 at 14:07
source share

Double check that parent-id is indeed 1. If not, and it is firefox , first try sudo killall -9 firefox-bin . After that, try killing specific process identifiers separately with sudo killall -9 [process-id] .

How is it possible for a process not to listen on kill -9 (neiter as user or root)?

If the process went through <defunct> and then becomes a zombie with parent 1, you can 'kill it manually; only init can. Zombie processes are already dead and have disappeared - they have lost the ability to kill, because they are no longer processed, but only a record of the process table and the associated exit code that are awaiting collection. You need to kill the parent, and you cannot kill init for obvious reasons.

But see here for more general information. Rebooting will naturally destroy everything.

+8
Mar 29 '09 at 14:44
source share

Is it possible that this process will be restarted (for example, init) only at the time of its murder?

You can easily check it. If after kill -9 PID PID matches that, then the process was not killed, but if it changed, the process was restarted.

+1
Mar 29 '09 at 15:12
source share

Recently, I got trapped in a Double Fork trap and landed on this page before I finally found my answer. Symptoms are identical, even if the problem is not the same:

  • WYKINWYT: What you are killing is not what you thought.

The minimum test code is shown below using the SNMP daemon as an example.

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> int main(int argc, char* argv[]) { //We omit the -f option (do not Fork) to reproduce the problem char * options[]={"/usr/local/sbin/snmpd",/*"-f","*/-d","--master=agentx", "-Dagentx","--agentXSocket=tcp:localhost:1706", "udp:10161", (char*) NULL}; pid_t pid = fork(); if ( 0 > pid ) return -1; switch(pid) { case 0: { //Child launches SNMP daemon execv(options[0],options); exit(-2); break; } default: { sleep(10); //Simulate "long" activity kill(pid,SIGTERM);//kill what should be child, //ie the SNMP daemon I assume printf("Signal sent to %d\n",pid); sleep(10); //Simulate "long" operation before closing waitpid(pid); printf("SNMP should be now down\n"); getchar();//Blocking (for observation only) break; } } printf("Bye!\n"); } 

At the first stage, the main process (7699) starts the SNMP daemon (7700), but we see that it is now Defunct / Zombie. In addition, we can see another process (7702) with the specified options

 [nils@localhost ~]$ ps -ef | tail root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0] root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1] root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2] root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2] root 7698 729 0 23:11 ? 00:00:00 sleep 60 nils 7699 2832 0 23:11 pts/0 00:00:00 ./main nils 7700 7699 0 23:11 pts/0 00:00:00 [snmpd] <defunct> nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161 nils 7727 3706 0 23:11 pts/1 00:00:00 ps -ef nils 7728 3706 0 23:11 pts/1 00:00:00 tail 

After simulating 10 seconds, we will try to kill the only process that we know (7700). What we finally achieved waitpid (). But Process 7702 is still here

 [nils@localhost ~]$ ps -ef | tail root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1] root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0] root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1] root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2] root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2] root 7698 729 0 23:11 ? 00:00:00 sleep 60 nils 7699 2832 0 23:11 pts/0 00:00:00 ./main nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161 nils 7751 3706 0 23:12 pts/1 00:00:00 ps -ef nils 7752 3706 0 23:12 pts/1 00:00:00 tail 

After providing the getchar () function character, our main process ends, but the SNMP daemon with pid 7002 is still here

 [nils@localhost ~]$ ps -ef | tail postfix 7399 1511 0 22:58 ? 00:00:00 pickup -l -t unix -u root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1] root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0] root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1] root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2] root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2] root 7698 729 0 23:11 ? 00:00:00 sleep 60 nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161 nils 7765 3706 0 23:12 pts/1 00:00:00 ps -ef nils 7766 3706 0 23:12 pts/1 00:00:00 tail 

Conclusion

The fact that we ignored the double fork mechanism made us think that the kill action did not work. But in fact, we just killed the wrong process.

By adding the -f (Do not (Double) Fork) option, everything goes as expected.

+1
Nov 14 '17 at 23:15
source share
 sudo killall -9 firefox 

Must work

EDIT: [PID] changed to firefox

0
Mar 29 '09 at 14:42
source share

ps -ef | grep firefox; and you can see 3 processes, kill them all.

0
Mar 29 '09 at 14:44
source share

You can also pstree and kill the parent. This ensures that you get the entire processing tree, not just the leaf.

0
Mar 29 '09 at 15:09
source share



All Articles