Determining the cause of a stopped process on Linux

I am trying to determine the cause of a stopped process in Linux. This is a telecommunications application running under a rather heavy load. There is a separate process for each of the 8 T1 intervals. Each so often one of the processes will be very immune - perhaps 50 seconds before the event is noted in the log with a very busy process.

Most likely, some system resource will be short. The obvious thing - CPU usage - looks fine.

Which linux utilities are best suited for finding and analyzing these kinds of things and as unobtrusive as possible, since this is a highly loaded system? It would seem that this requires processes, not systems. Maybe constant monitoring of / proc / pid / XX? Top would not seem too useful here.

+5
source share
3 answers

If you can determine this “moment of insensitivity”, you can use strace to bind to the process in question during this time and try to find out where it is “sleeping”:

strace -f -o LOG -p <pid>

A lighter but less reliable method:

  • , top/ps/gdp/strace/ltrace, (, "" 100% )

  • , strace . , , , :

    strace -e file -f -o LOG ....
    

strace , :

  • "vmstat 1 > /some/log" - , ( )

  • IO vmstat/iotop - , -

  • /proc/interrupts - , T1 ?

+8

, .

+2

Thanks - strace sounds useful. Capturing the process at the right time will be part of the fun. I came up with a scheme for periodically recording a timestamp in shared memory, and then a monitor with another process. Submitting SIGSTOP will allow me to at least check the application stack using gdb. I don’t know if the strain in the paused process will tell me a lot, but then I could turn on strace and see what it says. Or enable strace and hit the process with SIGCONT.

0
source

All Articles