A few questions about C ++ built-in functions

The training materials from the class I took seem to make two conflicting statements.

One side:

"Using built-in functions usually leads to faster execution."

On the other hand:

"Using built-in features may slow performance due to more frequent swapping."

Question 1: Are both statements true?

Question 2: What is meant by โ€œreplacementโ€ here?

Please take a look at this snippet:

int powA(int a, int b) { return (a + b)*(a + b) ; } inline int powB(int a, int b) { return (a + b)*(a + b) ; } int main () { Timer *t = new Timer; for(int a = 0; a < 9000; ++a) { for(int b = 0; b < 9000; ++b) { int i = (a + b)*(a + b); // 322 ms <----- // int i = powA(a, b); // not inline : 450 ms // int i = powB(a, b); // inline : 469 ms } } double d = t->ms(); cout << "--> " << d << endl; return 0; } 

Question 3: Why is the performance so similar between powA and powB ? I would expect powB performance powB be 322ms, since it is, in the end, inline.

+4
source share
6 answers

Question 1

Yes, both statements can be true, in particular circumstances. Obviously, both of them will be true at the same time.

Question 2

โ€œSwapโ€ most likely refers to the OS swap behavior where pages are uploaded to disk when memory pressure becomes high.

In practice, if your built-in functions are small, you usually notice a performance improvement due to eliminating the overhead of calling the function and returning. However, in very rare cases, you can cause the code to grow in such a way that it cannot be completely inside the CPU cache (during a critically complex cycle), and you may run into poor performance. However, if you are coding at this level, you should probably be coding directly in assembler.

Question 3

The inline modifier is a hint to the compiler, which may require compiling this inline function. It should not follow your instructions, and the result may also depend on the specified compiler options. You can always look at the generated assembly code to see what it did.

Your test may not even do what you need, because your compiler may be smart enough to see that you are not even using the result of the function call that you assign to i , so it may not even bother to call your function. Again, look at the generated assembly code.

+5
source

inline inserts the code on the call site, saving when creating the stack frame, saving / restoring registers and the call (branch). In other words, using inline (when it works) is similar to writing code for an inline function instead of calling it.

However, inline does not guarantee anything to do, and it depends on the compiler. The compiler will sometimes be inline functions that are not built-in (well, it is probably the linker that does this when connection time optimization is turned on, but it is easy to imagine situations where this can be done at the compiler level - for example, when the built-in function is static).

If you want to force MSVC to perform inline functions, use __forceinline and check the build. There should be no calls - your code should compile for a simple sequence of instructions executed linearly.

Regarding speed: you can really make your code faster with small functions. However, when you inline perform large functions (and "Big" is difficult to determine, you need to run tests to determine what is large and what is not), the size of your code becomes larger. This is because the built-in function code repeats over and over on call sites. In the end, the whole point of calling a function is to save the instruction counter by reusing the same routine from several places in the code.

As code becomes larger, command caches may be overloaded, resulting in slower code execution.

Another point to consider: modern processors are out of order (most desktop processors - for example, Intel Core Duo or i7) have a mechanism (command tracing) for prefetching branches and inline , then at the hardware level. So aggressive inlining does not always make sense.

In your example, you need to see the assembly that your compiler generates. This may be the same for inline and non inline versions. If it is not inline , try __forceinline if you are using MSVC. If the synchronization is the same in both cases, this means that your processor does a good job of prefetching, and the bottleneck during execution is in a different place.

+4
source

Swap is an OS term that refers to the exchange of various memory pages in and out of a running process. Basically, a swap takes some time. The larger your application, the more it can be replaced.

When you embed a function instead of moving to a single routine, a copy of the entire function is unloaded at the calling location. This makes your program larger, and therefore could theoretically lead to a larger replacement.

Usually for very small methods (for example, your powA and powB), the embedding should be fine and lead to faster execution, but in fact it is just "theoretically" - perhaps a "big fish for frying" in terms of compression last performance drop from your code.

+1
source

Book operations are correct. In other words, when executed correctly, inline can improve performance and, if done incorrectly, can decrease performance.

Itโ€™s best to just embed small features. This will reduce additional build calls for memory transitions. That is how performance is improved.

If you are inline big functions, this can lead to the fact that the paging of the memory exceeds the size of the cache, therefore, will lead to an additional memory replacement. This makes the job difficult.

+1
source

Both statements are true, sort of. The declaration of the inline function is an indicator for the compiler for the inline, if possible. The compiler (usually) uses its own opinion about whether it is really built-in, but in C ++ declares that it inline changes the code generation, at least for character generation.

โ€œSwapโ€ in this context refers to swapping an executable image to disk. Since the executable file is larger, it can affect performance on systems with limited memory.

Answering your third question, the compiler chose the same behavior (my guess is not built-in) for both functions.

+1
source

When a regular function is compiled, machine code is compiled once and placed in one place separately from other functions that call it. When executing the code, the processor must go to the place where the code is stored, and this jump command requires additional time to load the function from memory. Sometimes, calling a function requires several transitions (or several loads and a jump), for example. virtual functions. There is also time spent saving and restoring registers and creating a stack frame, none of which are really needed for fairly small built-in functions.

When the built-in function is compiled, all of its machine code is inserted directly into the place where it is called, so the time for the jump command is eliminated. The compiler also optimizes the code of the built-in function based on its environment (for example, register assignment can consider both variables used outside the function and inside the function to minimize the number of registers that need to be stored). However, inline function code can appear in several places in the calling function (if it was called several times in the calling code), so overall this makes your code base larger. This can cause your code to become large enough so that it no longer fits into the processor cache, in which case the processor must go to main memory to extract your code, and this takes longer than getting everything from the cache. In some cases, this may offset the savings from eliminating the jump statement and make your code slower than if you had thrown code.

โ€œSwappingโ€ usually refers to virtual memory behavior, which has the same trade-offs as the CPU cache, but the time taken to load the code from disk is much longer, and the amount of memory that your program has to fill in for this to enter the game is much more. You are unlikely to ever see that built-in functions affect virtual memory performance.

Obviously, both effects do not occur immediately, but it is difficult to understand what will be applied in any given case.

+1
source

All Articles