Opinion? Yes. While you decide which profiler to buy or not, try it .
ADDED: @Max: Step-by-step instructions. The IDE has a pause button. Run the application in the IDE, and while it is subjectively slow, that is, while you are waiting for it, click the "pause" button. Then get a snapshot of the call stack.
To take a snapshot of the call stack, I show it (this is one of the debug windows). In the IDE parameters, you can find the parameters to display in the stack view. I will disable the option to display function arguments, because this makes the strings too long. I'm interested in the line number where the call is made, and the name of the function being called. Then, in the call stack view, you can do “Select All,” then “Copy,” and then paste it into Notepad. It's a little clumsy, I know, but I recorded them manually.
I take several samples this way. Then I look at them for lines that appear on more than one pattern, because it's time. Some of them are simply necessary, as "call the main thing", but some of them are not. These are gold nuggets. If I don’t find them, I continue to take samples until about 20. If I still don’t find (very unusual), the program will be quite optimized. (The key point is that every time you do this, the program becomes faster, and in the process, the remaining performance problems become relatively large and easier to find. That is, the program not only accelerates by a certain R ratio, but the rest more problems, in percent, for the same ratio.) *
Another thing I do in this process is to ask myself what the program does and why in this example. “Why” is very important because that is how you say whether the line is really necessary or if it can be replaced with something less expensive. If I'm not sure why he is there, I’ll take a little step-by-step, maybe look at the data or maybe return to several levels (shift-F11) until I understand what he is doing. It is all about him.
Existing profilers can help in this process if they actually sample the stack, save the samples, rank the rows according to what percentage of the samples they contain, and then let you study individual samples in detail. Maybe they will one day appear, but now they are gone. They hang on issues such as efficiency and measurement.
* Suppose your code spends 90% of its time executing X, and 9% of its time does Y, both are unnecessary. Take a small number of samples and you will see X, but probably not Y. If you fix X, you get 10x acceleration. Repeat the selection (you may need to wrap the outer loop around the program so you can take samples). Now you see Y with confidence, because now it takes 9% x 10x = 90%. Fixing gives you another 10x, for full 100x acceleration.