What causes this deadlock in the Ruby `trap` block?

I am reading an excellent book by Jesse Storymer, Working with Unix Processes . In the section on capturing signals from child processes that came out, he gives a code sample.

I modified this code a bit (see below) to get more noticeable information about what is happening:

  • That the parent resumes its own execution between the signals (which I see with puts ),
  • wait for multiple children in the same trap expression (sometimes I get a "Received CHLD signal", followed by several "child pid").

Expected Result

Typically, the output from the code below looks something like this:

 parent is working hard Received a CHLD signal child pid 73408 exited parent is working hard parent is working hard parent is working hard Received a CHLD signal child pid 73410 exited child pid 73409 exited All children exited - parent exiting too. 

Periodic error

But from time to time I get this error:

 trapping_signals.rb:17:in `write': deadlock; recursive locking (ThreadError) from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `block in <main>' from trapping_signals.rb:17:in `call' from trapping_signals.rb:17:in `write' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `puts' from trapping_signals.rb:17:in `block in <main>' from trapping_signals.rb:40:in `call' from trapping_signals.rb:40:in `sleep' from trapping_signals.rb:40:in `block in <main>' from trapping_signals.rb:38:in `loop' from trapping_signals.rb:38:in `<main> 

Can someone explain to me what is going wrong here?

The code

 child_processes = 3 dead_processes = 0 # We fork 3 child processes. child_processes.times do fork do # Each sleeps between 0 and 5 seconds sleep rand(5) end end # Our parent process will be busy doing some work. # But still wants to know when one of its children exits. # By trapping the :CHLD signal our process will be notified by the kernel # when one of its children exits. trap(:CHLD) do puts "Received a CHLD signal" # Since Process.wait queues up any data that it has for us we can ask for it # here, since we know that one of our child processes has exited. # We loop over a non-blocking Process.wait to ensure that any dead child # processes are accounted for. # Here we wait without blocking. while pid = Process.wait(-1, Process::WNOHANG) puts "child pid #{pid} exited" dead_processes += 1 # We exit ourselves once all the child processes are accounted for. if dead_processes == child_processes puts "All children exited - parent exiting too." exit end end end # Work it. loop do puts "parent is working hard" sleep 1 end 
+8
multithreading ruby signals deadlock fork
source share
1 answer

I looked at Ruby sources to see where this particular error occurs, and it only ever rises when the current thread tries to get a lock, but the same lock is already taken by the current thread. This means that the lock is not repeated:

 m = Mutex.new m.lock m.lock #=> same error as yours 

Now, at least, we know what is happening, but not yet why and where. An error message indicates that this happens during a puts call. When it is called, it finally ends with io_binwrite . stdout not synchronized, but buffered, so this is if the condition is met on the first call, and the buffer plus write lock for this buffer will be configured. Write locking is important to ensure atomicity of writing to stdout, it should not happen that two streams simultaneously writing to stdout mix output together. To demonstrate what I mean:

 t1 = Thread.new { 100.times { print "aaaaa" } } t2 = Thread.new { 100.times { print "bbbbb" } } t1.join t2.join 

Although both streams are written to stdout in turn, there will never be a single record — you will always have the full 5 a or b in sequence. This is what blocks the record for .

Now, what’s wrong in your case is the race condition on this record lock. The parent process is looped and written to stdout every second ("parent works"). But the same thread also ultimately executes the trap block and again tries to write to stdout ("Received CHLD signal"). You can verify that this is indeed the same thread by adding #{Thread.current} to your puts statements. If these two events happen close enough, then you will have the same situation as in the first example: the same thread is trying to get the same lock twice, and this ultimately causes an error.

+13
source share

All Articles