Debug main files generated in the client window

We get kernel files from running our software in the Client field. Unfortunately, since we always compiled with -O2 without debugging symbols, this led to situations where we could not understand why it failed, we changed the assemblies, so now they generate -g and -O2 together. Then we advise the client to run the -g binary, so it is easier to debug.

I have a few questions:

  • What happens when the main file is created from a Linux distribution other than the one we run in Dev? Is stack trace even meaningful?
  • Are there any good books for debugging Linux or Solaris? Some kind of example would orientate great. I am looking for real examples to find out why the routine collapsed and how the author came to a decision. Something more at an intermediate and advanced level would be good, as I have been doing this for a while. Some build will be good.

Here is an example of a failure that requires us to tell the client to get -g ver. binary file:

Program terminated with signal 11, Segmentation fault. #0 0xffffe410 in __kernel_vsyscall () (gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0x00454ff1 in select () from /lib/libc.so.6 ... <omitted frames> 

Ideally, I would like to decide why the application crashed - I suspect that this is a memory corruption, but I'm not 100% sure.

Remote debugging is strictly prohibited.

thanks

+6
source share
4 answers

What happens when a kernel file is created from a Linux distribution other than the one we use in Dev? Is stack tracing generally meaningful?

If the executable is dynamically linked, like yours, the stack created by GDB (most likely) will not make sense.

Reason: GDB knows that your executable file crashes 0x00454ff1 by calling something in libc.so.6 at 0x00454ff1 , but it does not know what code was at that address. Thus, it scans your copy of libc.so.6 and finds that it is in select , therefore prints it.

But the chances that 0x00454ff1 also selected in your client copy of libc.so.6 are pretty small. Most likely, the client had a different procedure at this address, possibly abort .

You can use disas select and observe that 0x00454ff1 is either in the middle of the instruction, or that the previous instruction is not CALL . If any of this is true, your stack trace does not make sense.

However, you can help yourself: you just need to get a copy of all the libraries listed in (gdb) info shared from the client system. Ask the customer to change them, for example, to

 cd / tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ... 

Then on your system:

 mkdir /tmp/from-customer tar xzf to-you.tar.gz -C /tmp/from-customer gdb /path/to/binary (gdb) set solib-absolute-prefix /tmp/from-customer (gdb) core core # Note: very important to set solib-... before loading core (gdb) where # Get meaningful stack trace! 

Then we advise the Customer to run the -g binary file to make it easier to debug.

A much better approach:

  • build with -g -o2 -o myexe.dbg
  • strip -g myexe.dbg -o myexe
  • distribute myexe to clients
  • when the client receives core , use myexe.dbg to debug it

You will have full symbolic information (file / line, local variables), without having to send a special binary file to the client and without revealing too many details about your sources.

+17
source

You can really get useful information from a crash dump, even from one of the optimized compilers (although this is called the so-called technically โ€œbig ass painโ€). The -g compilation is really better, and yes, you can do this even when the machine on which the dump occurred is a different distribution. In principle, with one caveat, all important information is contained in an executable file and ends in a dump.

When you map the main file to the executable, the debugger will be able to tell you where the failure occurred and show you the stack. That alone should help a lot. You should also learn as much as possible about the situation in which this occurs - can they reliably reproduce it? If so, can you reproduce it?

Now, here's a caveat: the place where the concept of โ€œeverything is thereโ€ breaks down is common object files, .so files. If it does not work due to problems with them, you will not have the character tables that you need; you can only see in which .so library this is happening.

There are several books on debugging, but I cannot come up with one that I would recommend.

+2
source

As far as I remember, you do not need to ask your client to work with binary code embedded with the -g option. You must have a build with the -g option. With this, you can load the main file and it will display the entire stack trace. I remember a few weeks ago, I created the main files, with the assembly (-g) and without -g, and the kernel size was the same.

0
source

Check the values โ€‹โ€‹of local variables that you see when passing through the stack? Especially around calling select (). Do it in the client window, just load the dump and go to the stack ...

Also check the FD_SETSIZE value on your DEV and PROD platforms!

0
source

Source: https://habr.com/ru/post/1412755/


All Articles