How do I improve profiling?

I need to profile the program to find out if I need to make any changes in terms of performance. I suspect there is a need, but first you need to measure. This is not a program, but it illustrates the problem I am facing:

#include <stdio.h> int main (int argc, char** argv) { FILE* fp = fopen ("trivial.c", "r"); if (fp) { char line[80]; while (fgets (line, 80, fp)) printf (line); fclose (fp); } return 0; } 

Here is what I did with it:

 % gcc trivial.c -pg -o trivial % ./trivial ... % gprof trivial gmon.out 

Of course, this is a trivial program, but I would think that it would make some kind of splash on the profiling radar. This is not true:

  called/total parents index %time self descendents called+self name index called/total children 0.00 0.00 1/1 __start [1704] [105] 0.0 0.00 0.00 1 _main [105] ----------------------------------------------- % cumulative self self total time seconds seconds calls ms/call ms/call name 0.0 0.00 0.00 1 0.00 0.00 _main [105] Index by function name [105] _main 

Can anyone visit me here? I would like the result to reflect what he called fgets and printf at least 14 times, and he still got to disk - there must be a certain amount of time.

When I run the same command in a real program, I get more functions, but even then this is not a complete list - just a sample.

Maybe gprof is not suitable for use. What?

This is on OS X Leopard.

Edit: I ran this program and got the following:

 % time real_program real 4m24.107s user 2m34.630s sys 0m38.716s 
+4
source share
9 answers

There are certain generally accepted beliefs in this matter that I would suggest you carefully study.

One of them is that the best (if not the only) way to find performance problems is to measure the time that each subroutine executes and how many times it is called.

It is top to bottom. This is due to the belief that the forest is more important than trees. It is based on the myths of "code speed" and "bottlenecks." This is not very scientific.

The performance problem is more like an error than a quantitative one. What he does wrong is a waste of time and needs to be fixed. It is based on a simple observation:

Slowness consists of time spent on bad causes.

To find it, give an example of the state of the program by random interruptions of hourly time and investigate their causes.

If something causes slowness, then this fact itself provides it to your samples. Therefore, if you take enough samples, you will see this. You will find out approximately how much time it costs you, for a fraction of the samples that show it.

A good way to tell if it is well spent from time to time is to look carefully at the call stack. Each function call on the stack has a hidden reason, and if any of these reasons is bad, then the reason for the whole pattern is bad.

Some profilers will tell you, at the instruction level, that each statement is worth your while.

Personally, I just accidentally stop the program several times. Any challenges that appear on multiple samples are likely candidates for suspicion. It will never work.

You can say, "This is not accurate." This is very accurate. It pinpoints the instructions causing the problem. It does not give you 3 decimal places of precision of time. That is, it is disgusting to measure, but excellent for diagnosis.

You can say, "What about recursion?" Well, what about this?

You can say: "I think this can only work on toy programs." That would be just a wish. In fact, large programs tend to have more performance problems because they have deeper stacks, thus more opportunities for calls with bad reasons, and the selection finds them just fine, thanks.

Sorry I'm crazy. I just hate seeing myths in what should be a scientifically sound field.

MORE

0
source

I think you can try various Valgrind tools, especially callgrind (used to get the number of calls and inclusive cost for each call that occurs in your program).

There are various visualization tools for valgrind output. I don't know about some tools for OS X, though.

+5
source

By default, gprof shows limited data. And this is good. Look at your result - it only mentions main (by default). Now let's look at the calls column - this is what you want. But for other functions, try:

 gprof -e main -f printf -f fgets trivial > gprof.output 

Here is a link to some commands. Alternatively, try man gprof on your system. Here's how to interpret the data.

Also, check out ltrace , strace and ptrace (if available - I no longer remember if they are all on OSX) - they are funny!

+5
source

Shark is a profiler included in developer tools.

+3
source

Before profiling the code, you need to see where your program spends its time. Run it under time (1) to see the corresponding user, system and wall time. Code profiling only makes sense when the userโ€™s time is close to the wall clock. If the user and system times are very short compared to wall clocks, your program is connected with I / O; if the system time is close to the wall clock time, your program is connected to the kernel. In both cases, run your program under strace -c or a suitable dtrace script to determine the time taken for each system call.

+3
source

Profiling does not indicate disk access, only which functions were called and they will not be displayed due to VM caching.

Valgrind does not work well on OS X.

With Leopard, you have the Dtrace utility; I did not use it, but it could get the information you are looking for.

+2
source

The absence of certain functions usually means that these functions are not compiled for profiling. In particular, for profile code that uses standard features like printf (almost always, I think), you need a version of the C library that is compiled with support for profiling. I am not familiar with OS X, but on Linux I needed to install the libc6-prof package, which includes the libc_p library.

Btw, I really believe that OS X (or maybe Xcode?) Comes with a profiling tool. This is not as accurate as the gprof method, because it uses fetching, but you can run it in any program without special compilation.

+1
source

Having a look at your program, since you are using file processing (only), it also depends on any cache included. So be careful, your profiling results may vary depending on the behavior of your cache.

+1
source

For the example code that you indicated above, if you check the call stack several times, you basically see these stacks in some proportion:

 ------------------------------------- ... main.c: 4 call _fopen ... call _main ------------------------------------- ... main.c: 8 call _fgets ... call _main ------------------------------------- ... main.c: 9 call _printf ... call _main ------------------------------------- ... main.c: 11 call _fclose ... call _main 

and the proportions will tell you approximately how much time is spent on each challenge. You are unlikely to see more, because the "exceptional" time is almost zero compared to the calls to the I / O library. What the stack samples tell you are the exact statements that cost you the most and are about the same, no matter how big the program is.

0
source

All Articles