Can an operating system restart a process that gets stuck in an infinite loop?

The other day, when testing on a Linux server, we noticed that under certain conditions one process may die and then start again. After checking the code, we found that it was caused by an infinite loop.

It aroused my curiosity, how did this process fade and then begin? Does this mean that the OS detects and detects an abnormal process and restarts it? If so, how does it work?

+4
source share
2 answers

Suppose you cannot fix your code ... And let all crazy options, such as attaching gdb through a script or so, be ignored.

You can either check the CPU usage (most random endless cycles that I used 100% of the CPU for an hour :)), or (a more likely option) use strace to check what the software is doing right now and implement its own signature trace (if these 20 APIs are repeated 20 times, allow an infinite loop or so).

For instance:

 #!/bin/bash strace -p`cat your_app.pid` | ./your_signature_evaluator # Or strace -p12345 | ./your_signature_evaluator 

As for automatic recognition of the system ... It seems normal that the program crashes after calling things in a loop uncontrollably (for example, malloc() until you run out of memory, open the files ...), but I (and I will correct me in the comment, if I am mistaken) I have never seen the system (kernel) restart the application. I think you have:

  • there are conditions (signal processing, whatever) inside the program that helps to restore
  • you have a watchdog timer (check every 20 seconds that <pid> running and if a new instance is not starting)
  • you run a distribution that provides service / program configuration by restarting if stopped

But I really doubt that Linux will be so good for your application on it.

+4
source

If the person who wrote this kernel solved the problem

PS: Vytor - Web servers are in an infinite loop and do not use 100% CPU.

0
source

All Articles