C ++ code profiler, very sleepy

I am new to profiling. I would like to optimize my code to suit time constraints. I use Visual C ++ 08 Express and therefore had to load the profiler, for me it is Very Sleepy. I did a search, but did not find a worthy Sleepy tutorial, and here is my question: How to use it correctly? I understood the general idea of ​​profiling, so I sorted it by% exclusively to find my bottlenecks. First, at the top of this list, I have ZwWaitForSingleObject , RtlEnterCriticalSection , new operator , RtlLeaveCriticalSection , printf , some iterators ... and after they take about 60%, my first function comes in, the first position with Child Calls. Can someone explain to me why it turned out above what they mean and how can I optimize my code if I do not have access to these critical 60%? (for the "source file": unknown ...). Also, for my function, I think I get time for each line, but that is not the case, for example. arithmetic or some functions do not have time (not enclosed in unused "if" sentences). And the last: how do you know that a line can perform ultrafast, but it is called thousands of times, being the actual bottleneck?

Finally, sleep is good? Or some free alternative for my platform?

Help really appreciate! Hooray!

        • UPDATE - - - - -

I found another version of the profiler called simple Sleepy. It shows how many times a fragment was called, plus the number of lines (I think this indicates critical). So in my case .. KiFastSystemCallRet takes 50% ! Does this mean that he expects some data to be right? How to improve this question, maybe there is a worthy approach for tracking what causes these multiple calls and ultimately deleting / changing it?

+4
source share
2 answers

I would like to optimize my code to suit time constraints

In this work, you are faced with a constant problem. You want to find ways to keep your code busy less time, and you (and many people) assume (and taught) the only way to do this is by taking various kinds of measurements.

There is a minority opinion, and the only thing he should recommend is the actual substantial results ( plus the iron theory behind it ).

If you have a bottleneck (and you are probably several), it takes a little time, for example, 30%.
Just treat it as a mistake you can find.

Accidentally stop the program using the pause button and carefully look at what the program does and why it does it. Ask if this could be something you can get rid of. Do it 10 times. On average, you will see a problem in 3 of the pauses. Any activity that you see more than once, if it is really not needed, is a speed error. This does not tell you exactly how much the problem is worth, but it tells you exactly what the problem is and what it costs to fix. You will see things in such a way that no profiler can find, because profilers are only programs and cannot be wide in terms of what the opportunity is.

Some people are risk averse thinking it may not give enough acceleration to make it worth it. Of course, there is little chance of low returns, but it's like investing. The theory says that on average it will be useful, and there is also a small probability of high returns. In any case, if you are worried about risks, a few more samples will calm your fears.

After fixing the problem, the remaining bottlenecks occupy more percent, because they do not get less, but the overall program. This way, it will be easier to find them when you repeat the whole process.

There is a lot of literature on profiling, but very little, which actually indicates how many accelerations it achieves in practice. Here is a concrete example with almost 3-speed acceleration.

+5
source

I used GlowCode (a commercial product similar to Sleepy) to profile my own C ++ code. You start the setup process, then execute your program, and then look at the data received by the tool. The instrumental step introduces a small trace function to the access points and access points of each method and simply measures how long it takes to complete each function.

Using the call graph profiling tool, I listed the methods sorted from "most used time" to "least used time", and the tool also displays the number of calls. Simple drilling in the highest percentage routine showed me which methods used the most time. I saw that some methods were very slow, but in them I found that they were waiting for user input or for a service response. And some took a long time because they called several internal procedures thousands of times each call. We found that someone made a coding error and went over to the large linked list many times for each item in the list when they really only needed to go through it once.

If you sort the “most often called” into the “least called”, you can see some of the tiny functions that are called from everywhere (iterator methods like next() , etc.). Something to check is to make sure the functions that are most often called are really clean. Saving a millisecond in a routine called 500 times to draw a screen will speed up that screen by half a second. This will help you decide which of the most important places you spend.

I saw two general approaches to using profiling. One of them is to perform "general" profiling by performing a set of "normal" operations and discovering which methods slow down the application the most. Another is to do specific profiling, focusing on specific user complaints about performance and skipping these features to identify their problems.

One thing I would like to warn about is to limit your changes to those that will noticeably affect user experience or system bandwidth. Shaving one millisecond with the mouse will not make any difference to the average user, because human reaction time is simply not that fast. Racing drivers have a reaction time of 8 milliseconds, some elite gamers are faster, but ordinary users, such as bank meters, will have a reaction time in the range of 20-30 milliseconds. The benefits will be negligible.

Creating twenty 1 millisecond enhancements or one change in 20 milliseconds will make the system more responsive. It is cheaper and better if you can make one big improvement over many small improvements.

Similarly, shaving one millisecond from a service that processes 100 users per second will make a 10% improvement, which means you can improve service to handle 110 users per second.

The reason for concern is that changes to the encoding strictly to improve performance often negatively affect the structure of your code, adding complexity. Let's say you decide to improve the database call by caching the results. How do you know when a cache is invalid? Are you adding a cache cleanup mechanism? Consider a financial transaction in which cyclic movement across all positions to create a total amount will be slow, so you decide to keep an accelerated working battery. Now you need to change runTotal for all kinds of situations, such as linear voids, turns, deletions, modifications, changes in quantity, etc. This makes the code more complex and error prone.

+2
source

All Articles