Why do forced built-in functions lead to poor performance?

If I embed a function. The body of the function call will be copied instead of the call to the () call. Why can this lead to poor performance?

Edit: What about cache misses due to great features? Why is there a rule "only built-in functions with max. 3 lines"?

+5
source share
3 answers

There may be a marginal case where nesting a function can increase the size of a program or move program bits around, so cache misses occur where they were not before. This would not be a common occurrence, as caches are designed to handle the most common situations and are large enough compared to most hotspots.

+5
source

There is no standard way to force input of built-in functions in modern C ++ compilers, so this is a pretty moot point. However, assuming that you are using compiler-specific functions to force inline (and the compiler does not ignore it), this will not lead to poor performance, but it will increase the size of the executable due to the presence of more copies of the same code.

Change In the comment below, it should be mentioned that there is a very unlikely edge where your code can execute different copies of the same built-in function in the immediate vicinity, which reduces the effectiveness of the command cache. The likelihood that this will noticeably affect performance is extremely small, but in some cases it may be.

+4
source

We must take a step back and try to explain how the processors work. Usually they have different caches, one for the code that tells the CPU the instructions that will be needed to execute, and one for the data in which the operations are applied.

Data caching errors are "easy" to solve, try using the smallest data structures you can add to the tight members that you access more often ...

Command cache errors are more difficult to understand and solve, and this is also the reason why he usually recognized that C ++ polymorphic behavior is slower than regular function calls. Basically, the CPU will pre-cache instructions that are stored close to the execution point that you are trying to execute, if everything is built-in, there is only more data, and it will not be able to pre-extract everything that will lead to skipping the cache.Note that this is just simplified version. In my experience, I have had problems with template instances that would generate a lot of code, which would lead to lower performance than just simple virtual calls and not too deep hierarchy of objects.

Since Alexandrescu always indicates that you should always specify your code

Source: What Every Programmer Should Know About Memory

+2
source

Source: https://habr.com/ru/post/1211101/


All Articles