How to find CPU usage in Google Profiler

I am using the Google Profiling Profiling tool.

http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

The documentation gives

Text output analysis

Text mode has output lines that look like this:

14 2.1% 17.2% 58 8.7% std::_Rb_tree::find 

Here's how to interpret the columns:

  • The number of profiles in this function
  • Percentage profiling samples in this function
  • Percentage profiling of samples in functions printed so far
  • The number of profiling samples in this function and its calls
  • Percentage of profiling of samples in this function and its calls
  • Function name

But I can’t understand which columns tell me about the exact or percentage usage of the CPU function?

How do I get the processor to use a function that matches the google profile?

+4
source share
1 answer

Text mode has output lines that look like this:

It will have many lines, for example, to collect a profile:

 $ CPUPROFILE=a.pprof LD_PRELOAD=./libprofiler.so ./a.out 

The a.out program is the same as here: does Kcachegrind / callgrind not match dispatcher functions?

Then parse it with the pprof top command:

 $ pprof ./a.out a.pprof Using local file ./a.out. Using local file a.pprof. Welcome to pprof! For help, type 'help'. (pprof) top Total: 185 samples 76 41.1% 41.1% 76 41.1% do_4 51 27.6% 68.6% 51 27.6% do_3 37 20.0% 88.6% 37 20.0% do_2 21 11.4% 100.0% 21 11.4% do_1 0 0.0% 100.0% 185 100.0% __libc_start_main 0 0.0% 100.0% 185 100.0% dispatcher 0 0.0% 100.0% 34 18.4% first2 0 0.0% 100.0% 42 22.7% inner2 0 0.0% 100.0% 68 36.8% last2 0 0.0% 100.0% 185 100.0% main 

So, here: the total number of samples is 185; and the frequency is default (1 sample every 10 ms or 100 samples per second). Then the total duration of the work is ~ 1.85 seconds.

The first column is the number of samples that were executed when a.out works in this function. If we divide it by frequency, we get a general estimate of the time for a given function, for example. do_4 runs for ~ 0.8 s

The second column is the number of samples in a given function divided by the total number or percentage of this function in the total execution time of the program. Thus, do_4 is the slowest function (41% of the total program time), and do_1 is only 11% of the program execution time. I think you are interested in this column.

The third column is the sum of the current and previous rows; therefore, we can know that the 2 slowest functions, do_4 and do_3 fully account for 68% of the total execution time (41% + 27%)

The 4th and 5th columns are similar to the first and second; but they will take into account not only samples of the given function itself, but also samples of all functions called from the given ones, directly and indirectly. You can see that main and all called from it 100% of the total execution time (because main is the program itself or the root of the calltree program) and last2 with its children are 36.8% of the execution time (its children in my program: half do_4 calls and half do_3 calls = 41.1 + 27.6 / 2 = 69.7 / 2 ~ = 34% + some time in the function itself)

PS: there are other useful pprof commands, such as callgrind or gv , which shows a graphical representation of the call tree with added profiling information.

+7
source

All Articles