We must take a step back and try to explain how the processors work. Usually they have different caches, one for the code that tells the CPU the instructions that will be needed to execute, and one for the data in which the operations are applied.
Data caching errors are "easy" to solve, try using the smallest data structures you can add to the tight members that you access more often ...
Command cache errors are more difficult to understand and solve, and this is also the reason why he usually recognized that C ++ polymorphic behavior is slower than regular function calls. Basically, the CPU will pre-cache instructions that are stored close to the execution point that you are trying to execute, if everything is built-in, there is only more data, and it will not be able to pre-extract everything that will lead to skipping the cache.Note that this is just simplified version. In my experience, I have had problems with template instances that would generate a lot of code, which would lead to lower performance than just simple virtual calls and not too deep hierarchy of objects.
Since Alexandrescu always indicates that you should always specify your code
Source: What Every Programmer Should Know About Memory
source share