How to get parent calls for libc6 characters (e.g. _int_malloc) using linux perf?

I profile a C ++ application using linux perf and get a good control flow graph using GProf2dot . However, some characters from the C library (libc6-2.13.so) take up a significant part of the total time and still have no built-in edges.

For example:

  • _int_malloc takes 8% of the time, but does not have parent calls.
  • __strcmp_sse42 and __cxxabiv1::__si_class_type_info::__do_dyncast together take about 10% of the time and have a caller whose name is 0 , which has callers 2d6935c , 2cc748c and 6 , which do not have callers.

As a result, I can’t find out which routines are responsible for all this markup and dynamic casting using only perf. However, it seems that other characters (e.g. malloc , but not _int_malloc ) do have parent calls.

Why is there no parental perception for _int_malloc? Why can't I find __do_dyn_cast end users? And is there a way to change my setting so that I can get this information? I am on x86-64, so I wonder if I need (non-standard) libc6 with pointers to frames.

+7
source share
2 answers

Update:. Starting with kernel 3.7.0, you can determine the parent symbol names in system libraries using perf record -gdwarf <command> .

Using -gdwarf , there is no need to compile with -fno-omit-frame-pointer .

Original answer: Yes, you probably need libc6 compiled with frame pointers ( -fno-omit-framepointer ) to x86_64 at the moment (May 24, 2012).

However, developers are currently working to enable the first tools to use DWARF disable information. This means that frame pointers are no longer needed to get information about backtrace on x86_64. However, Linus does not want to deploy DWARF in the kernel. Thus, performance tools will save registers at system startup and perform DWARF deployment in the perfension userpace tool using the libunwind library.

This method has been tested to successfully identify callers (e.g.) malloc and dynamic_cast . However, the patch set is not yet integrated into the Linux kernel and needs to be further developed until it is ready.

+5
source

_int_malloc and __do_dyn_cast called from routines that the profiler cannot identify because they do not have table table information for them.

What else, it looks like you are showing your (exclusive) time. This is only useful for finding hot spots in routines that: a) have a lot of time, and b) you can fix it.

Profiles based on the original unix profil were created there. Real software consists of functions that spend almost all of their time calling other functions, and you need to find the code that is on the stack most of the time, and not the program counter most of the time.

So, you need to configure perf to fetch the stack and tell you the percentage of time that each of your routines is on the stack. Even better, if it reports not just routines, but lines of code, as in Zoom . It is best to take samples on a wall clock, so you are not blind to IO.

There you can still say about all this.

+1
source

All Articles