Performance on a modern processor is far from trivial. Here are a few things that complicate it:
- Computers are fast. Your processor can execute up to 6 billion instructions per second. Thus, even the slowest instruction can be executed millions of times per second, which means that it really matters if you use it very often.
- A modern processor has hundreds of flight instructions at a time. They are pipelined, which means that while one command is being read, the other is reading from registers, the third is executing, and the fourth is writing back to the register. A modern processor has 15-20 such stages. In addition, they can carry out 3-4 instructions at the same time at each of these stages. And they can reorder these instructions. If the unit of multiplication is used by another command, perhaps we can, for example, find the add command to execute instead. Therefore, even if you have a few slow instructions, their cost can be hidden very well most of the time by following other instructions, waiting for the slow one to end.
- Memory is hundreds of times slower than a processor. The instructions that are executed do not matter much if their value is overshadowed by extracting data from memory. And even this is unreliable, because the processor has its own built-in caches to try to hide this cost.
So, the short answer is: "Don't try to outsmart the compiler." If you can choose between two equivalent expressions, perhaps the compiler can do the same and choose the most efficient one. The cost of training varies depending on all of the above factors. What other instructions are being executed, what data is in the CPU cache, what exact processor model is the code that works, and so on. Code that is super-efficient in one case can be very inefficient in other cases. The compiler will try to select the most effective instructions and plan them as best as possible. If you don't know more than the compiler about this, you are unlikely to be able to handle this better.
Do not try to use such micro-optimizations unless you really know what you are doing. As shown above, low-level performance is a ridiculously complex question, and it is very easy to write โoptimizationsโ that lead to much slower code. Or that just sacrifices readability on something that doesn't make any difference.
In addition, most of your code simply does not have a noticeable effect on performance. People generally like to quote (or misquote) Knut on this subject:
We must forget about little efficiency, say, about 97% of the time: premature optimization is the root of all evil
People often interpret this as "don't try to optimize your code." If you really read the full quote, some more interesting implications should become clear:
Most of the time we should forget about microoptimizations. Most code runs so rarely that optimization does not matter. Bearing in mind the number of instructions that a processor can execute per second, it is obvious that a block of code must be executed very often in order for the optimization in it to have any effect. So in about 97% of cases, your optimizations will be a waste of time. But he also says that sometimes (3% of the time), your optimizations will make a difference. And, obviously, searching for these 3% is a bit like finding a needle in a haystack. If you just decide to "optimize your code" in general, you will spend your time on the first 97%. Instead, you first need to find 3% that really need optimization. In other words, run your code through the profiler and let it tell you which code takes the most processor time. Then you know where to optimize. And then your optimizations are no longer premature.
jalf
source share