I have a Docker container that runs bash in PID1, which in turn runs a multi-year (complex) service that sometimes produces zombie processes born at the bash level in PID1. It seems that these zombies have never received.
I am trying to reproduce this problem in a minimal container so that I can test mitigation, for example, using the correct init as PID1, not bash.
However, I could not reproduce the zombie processes. The bash in PID1 seems to be reaping children, even those he inherited from another process.
Here is what I tried:
docker run -d ubuntu:14.04 bash -c \ 'bash -c "start-stop-daemon
My expectation was that start-stop-daemon would be a double fork to create a process born for bash in PID1, then exec in sleep 30 , and when the dream came out, I expected the process to remain a zombie. sleep 300 simulates long-term service.
However, bash extracts this process, and I can notice that by running strace in the bash process (from a host running docker):
$ sudo strace -p 2051 strace: Process 2051 attached wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9 wait4(-1,
I am running docker 1.11.1-rc1, although I have the same experience with docker 1.9.
$ docker --version Docker version 1.11.1-rc1, build c90c70c $ uname -r 4.4.8-boot2docker
Given that strace shows that bash is reaping (orphaned) children, is bash a suitable PID1 in the docker container? What else can cause the zombies that I see in a more complex container? How can i reproduce?
Edit:
I managed to bind strace to bash PID1 on one of the live containers that detected the problem.
Process 20381 attached wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11185 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11191 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11203 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11155 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11151 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11152 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11154 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11332 ...
Not sure if all those in the process, but none of the PIDs match those of the few non-existent zombie processes that docker exec $id ps aux | grep defunct docker exec $id ps aux | grep defunct .
Perhaps the trick is to catch it in action and see that wait4() returns a process that remains a zombie ...