Software kiosks during long runs

Fixed

Well that seems a little silly. It turns out that the top did not display correctly, and the programs actually continue to work. Perhaps the processor time has become too long to display? In any case, the program seems to be working fine, and the whole issue was controversial.

Thanks (and sorry for the stupid question).

Original Q:

I run the simulation on a computer running the Ubuntu 10.04.3 server. Short runs (<24 hours) work fine, but long runs ultimately fail. With stall, I mean that the program no longer receives any processor time, but it still stores all the information in memory. To run these simulations, I have an SSH and nohup program and transfer any output to a file.

Additional Information:

The system, of course, lacks RAM. The program does not need to read or write to the hard drive until completion; the calculation is performed completely in memory. The program is not killed, since it still has a PID after it has stalled. I use openmp, but increased the maximum number of processes and the maximum time is unlimited. I find the largest eigenvalues ​​of the matrix using the ARPACK fortran library.

Any thoughts on what causes this behavior, or how to resume my current program?

thank

+5
source share
3 answers

I assume this is the OpenMP program from your tags, although in reality it is not. Is ARPACK thread safe?

, ( MPI, OpenMP, ). , , , , , , . , gdb .

+4

, "", GDB thread apply all where.

  • , .
  • - (, ), , .

, UNIX , . /, .

+2

, (.. ), - gdb program *pid* ( , -g), strace , strace -p *pid*. strace - ( - , ptrace), , .

There is also an option called ltracethat intercepts function calls in dynamic libraries.

To get an idea of ​​this, try for example strace ls

Of course, it stracewill not help you if a running program does not make any system calls.

Sincerely. Basile Starinkevich

+1
source

All Articles