Improving debugging when Linux crashes in a C program

We have an embedded version of the Linux kernel that runs on the core of MIPs. The program we wrote runs a specific set of tests. During one of the stress tests (it works for about 12 hours), we get segregation. This, in turn, generates a core dump.

Unfortunately, a core dump is not very useful. The accident occurs in some system library that is dynamically linked (probably pthread or glibc). Backtrace does not help in a kernel dump, because it shows only the point of the failed situation and other callers (our user space application is built with -g -O0, but still does not have feedback):

Cannot access memory at address 0x2aab1004 (gdb) bt #0 0x2ab05d18 in ?? () warning: GDB can't find the start of the function at 0x2ab05d18. GDB is unable to find the start of the function at 0x2ab05d18 and thus can't determine the size of that function stack frame. This means that GDB may be unable to access that stack frame, or the frames below it. This problem is most likely caused by an invalid program counter or stack pointer. However, if you think GDB should simply search farther back from 0x2ab05d18 for code which looks like the beginning of a function, you can increase the range of the search using the `set heuristic-fence-post' command. 

Another failure is that we cannot start gdb / gdbserver. gdb / gdbserver continues to break __nptl_create_event. Seeing that the test creates threads, timers and destroys, then every 5 seconds it is almost impossible to sit, continuing to continue on them.

EDIT: Another note: backtrace and backtrace_symbols are not supported by our toolchain.

Consequently:

  • Is there a way to catch the seg error and generate more backtrace data, stack pointers, call stack, etc.?

  • Is there a way to get more data from a core dump crashed in a .so file?

Thanks.

+7
source share
2 answers

If all else fails with the debugger!

Just put "gdb" in the form of your regular start command and enter "c" ontinue to start the process. When the task is segfaults, it will return to the interactive gdb prompt, and not to the core dump. Then you can get more meaningful stack traces, etc.

Another option is to use a "farm" if one is available. This will tell you which system calls were used during abend.

+1
source

GDB cannot find function start at 0x2ab05d18

What is at this address during the failure?

Make info shared and find out if there is a library containing this address.

The most likely cause of your problems: did you run strip libpthread.so.0 before loading it into your target? Do not do this: GDB requires that libpthread.so.0 not be deleted. If your toolchain contains libpthread.so.0 with debugging symbols (and therefore too large for the purpose), run strip -g , not a full strip .

Update:

info shared Cannot access memory at address 0x2ab05d18

This means that GDB cannot access the list of shared libraries (which then explains the missing stack trace). The most common reason: the binary code that core actually created does not match the binary you passed to GDB. A less common reason: your main dump has been truncated (possibly due to an ulimit -c value too low).

+1
source

All Articles