How can I play the zombie process with bash as PID1 in docker?

I have a Docker container that runs bash in PID1, which in turn runs a multi-year (complex) service that sometimes produces zombie processes born at the bash level in PID1. It seems that these zombies have never received.

I am trying to reproduce this problem in a minimal container so that I can test mitigation, for example, using the correct init as PID1, not bash.

However, I could not reproduce the zombie processes. The bash in PID1 seems to be reaping children, even those he inherited from another process.

Here is what I tried:

docker run -d ubuntu:14.04 bash -c \ 'bash -c "start-stop-daemon --background --start --pidfile /tmp/sleep.pid --exec /bin/sleep -- 30; sleep 300"' 

My expectation was that start-stop-daemon would be a double fork to create a process born for bash in PID1, then exec in sleep 30 , and when the dream came out, I expected the process to remain a zombie. sleep 300 simulates long-term service.

However, bash extracts this process, and I can notice that by running strace in the bash process (from a host running docker):

 $ sudo strace -p 2051 strace: Process 2051 attached wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9 wait4(-1, 

I am running docker 1.11.1-rc1, although I have the same experience with docker 1.9.

 $ docker --version Docker version 1.11.1-rc1, build c90c70c $ uname -r 4.4.8-boot2docker 

Given that strace shows that bash is reaping (orphaned) children, is bash a suitable PID1 in the docker container? What else can cause the zombies that I see in a more complex container? How can i reproduce?

Edit:

I managed to bind strace to bash PID1 on one of the live containers that detected the problem.

 Process 20381 attached wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11185 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11191 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11203 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11155 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11151 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11152 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11154 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11332 ... 

Not sure if all those in the process, but none of the PIDs match those of the few non-existent zombie processes that docker exec $id ps aux | grep defunct docker exec $id ps aux | grep defunct .

Perhaps the trick is to catch it in action and see that wait4() returns a process that remains a zombie ...

+5
source share
1 answer

I also wanted to check if my jenkins slave containers could generate zombies or not.

Since my images run the scl binary, which in turn runs the JLNP java client, I did the following in the jenkins groovy script command console:

 def process=new ProcessBuilder("bash", '-c', 'sleep 10 </dev/null &>/dev/null & disown').redirectErrorStream(true).start() println process.inputStream.text println " ps -ef".execute().text 

Created by zombies. That is, with scl ends as PID 1.

Then I looked at your question and decided to try bash. My first attempt was to modify ENTRYPOINT:

 bash -c "/usr/bin/scl enable rh-ror42 -- /usr/local/bin/run-jnlp-client $1 $2" -- 

Then, looking at the ps output, I realized that PID 1 was not bash , but in fact PID 1 was still scl binary. Finally, the command changed to:

bash -c "/ usr / bin / scl enable rh-ror42 - / usr / local / bin / run-jnlp-client $ 1 $ 2; ls" -

This adds some random second command after the scl command. And voila - bash became PID 1 and no longer generates zombies.

Looking at your example, I see that you are running bash -c with several commands. So, on the test bed, something like my last command works for you. But in your working containers, it is likely that you run bash -c with just one command, and it seems that bash become smart enough to execute exec efficiently. And, probably, in your working containers that generate zombies, bash is actually not PID 1 contrary to what is expected.

Perhaps you can ps -ef inside existing work containers and verify my assumption is correct.

+2
source

All Articles