Fixed
Well that seems a little silly. It turns out that the top did not display correctly, and the programs actually continue to work. Perhaps the processor time has become too long to display? In any case, the program seems to be working fine, and the whole issue was controversial.
Thanks (and sorry for the stupid question).
Original Q:
I run the simulation on a computer running the Ubuntu 10.04.3 server. Short runs (<24 hours) work fine, but long runs ultimately fail. With stall, I mean that the program no longer receives any processor time, but it still stores all the information in memory. To run these simulations, I have an SSH and nohup program and transfer any output to a file.
Additional Information:
The system, of course, lacks RAM. The program does not need to read or write to the hard drive until completion; the calculation is performed completely in memory. The program is not killed, since it still has a PID after it has stalled. I use openmp, but increased the maximum number of processes and the maximum time is unlimited. I find the largest eigenvalues of the matrix using the ARPACK fortran library.
Any thoughts on what causes this behavior, or how to resume my current program?
thank
source
share