Is variable declaration expensive?

When coding in C, I came across the situation below.

int function () { if (!somecondition) return false; internalStructure *str1; internalStructure *str2; char *dataPointer; float xyz; /* do something here with the above local variables */ } 

Given that the if in the above code can return from a function, I can declare variables in two places.

  • Before the if .
  • After the if .

As a programmer, I would think of saving a variable declaration after an if Statement.

Is an ad recording something? Or is there another reason to prefer one path to each other?

+71
c
Jan 01 '15 at 10:24
source share
12 answers

In C99 and later (or with a common corresponding extension to C89) you can mix statements and declarations.

As in earlier versions (only when the compilers became smarter and more aggressive), the compiler decides how to allocate registers and the stack, or make any number of other optimizations that comply with the as-if rule.
This means that in terms of performance, there is no expectation of any differences.

In any case, this was not the reason why it was allowed:

It is intended to limit the scope and, therefore, reduce the context that a person should keep in mind when interpreting and verifying code.

+93
Jan 01 '15 at 10:29
source share

Do whatever makes sense, but the current coding style recommends putting variable declarations as close as possible to their use.

In fact, variable declarations are free on almost every compiler after the first. This is due to the fact that almost all processors control their stack with a stack pointer (and, possibly, a frame pointer). For example, consider two functions:

 int foo() { int x; return 5; // aren't we a silly little function now } int bar() { int x; int y; return 5; // still wasting our time... } 

If I were to compile them in a modern compiler (and say that it should not be smart and optimize my unused local variables), I would see this (x64 build example, others are similar):

 foo: push ebp mov ebp, esp sub esp, 8 ; 1. this is the first line which is different between the two mov eax, 5 ; this is how we return the value add esp, 8 ; 2. this is the second line which is different between the two ret bar: push ebp mov ebp, esp sub esp, 16 ; 1. this is the first line which is different between the two mov eax, 5 ; this is how we return the value add esp, 16 ; 2. this is the second line which is different between the two ret 

Note: both functions have the same number of operation codes!

This is due to the fact that almost all compilers will allocate all the space they need before they start (forbidding fancy things like alloca , which are processed separately). In fact, on x64, it is imperative that they do this in an efficient manner.

(Edit: As Forss pointed out, the compiler can optimize some local variables into registers. More technically, I must argue that the first variable that "flows" onto the stack costs 2 opcodes codes, and the rest are free)

For the same reasons, compilers will collect all declarations of local variables and allocate space for them straight up. C89 requires all declarations to be first, because it was designed as a 1-pass compiler. . In order for the C89 compiler to know how much space to allocate, it needed to know all the variables before releasing the rest of the code. In modern languages, such as C99 and C ++, compilers should be much smarter than they were in 1972, so this limitation is relaxed for the convenience of developers.

Modern coding methods involve approximating variables to their use

This has nothing to do with compilers (which obviously could not have been anyway). It was found that most programmers programmers read code better if the variables are placed close to where they are used. This is just a style guide, so feel free to disagree with it, but there is a great consensus among developers that this is the “right way”.

Now for a few corner cases:

  • If you use C ++ with constructors, the compiler will allocate space forward (since it’s faster to do so and it won’t hurt). However, the variable will not be built in this space until the correct location in the code stream. In some cases, this means that variables close to their use can be even faster than setting them up ... flow control can guide us around the variable declaration, in which case the constructor does not even need to be called.
  • alloca processed above the layer above it. For those who are curious, alloca implementations tend to affect the movement of the stack pointer by some arbitrary amount. Functions using alloca are needed to keep track of this space anyway, and make sure the stack pointer is adjusted up again before leaving.
  • It may be the case that you usually require a 16-byte stack space, but under one condition you need to allocate a local array of 50 KB in size. Regardless of where you put your variables in the code, almost all compilers will allocate 50kB + 16B of stack space each time the function is called. This rarely matters, but in intrusively recursive code, it can overflow the stack. You either need to move the code that works with the 50kB array to your own function, or use alloca .
  • Some platforms (for example: Windows) require a special function call in the prolog if you allocate more space on the stack. This should not change the analysis much (in implementation, this is a very fast sheet function, which simply produces 1 word per page).
+42
Jan 01 '15 at 21:07
source share

In C, I believe that all variable declarations are applied as if they were at the top of the function declaration; if you declare them in a block, I think this is just a subject area (I don't think this is the same in C ++). The compiler will perform all optimizations with respect to variables, and some may even disappear effectively into machine code at higher settings. The compiler will then determine how much space the variables will need, and then, at run time, create a space known as the stack on which the variables live.

When a function is called, all the variables that are used by your function are pushed onto the stack, as well as information about the function being called (i.e. return address, parameters, etc.). It doesn't matter where the variable was declared, it just was declared - and it will be pushed onto the stack, regardless.

Declaring variables is not "expensive" as such; if it is simple enough not to be used as a variable, the compiler will probably delete it as a variable.

Check this:

Da stack

Wikipedia on call tables , Some place on the stack

Of course, it all depends on the implementation and depends on the system.

+22
Jan 01 '15 at 10:51
source share

Yes, it may be worth the clarity. If there is a case when a function should do nothing at all under any condition (as if a global flag was detected in your case), then placing the check at the top where you show it above is, of course, easier to understand - what is important when debugging and / or documenting.

+11
Jan 01 '15 at 10:31
source share

This ultimately depends on the compiler, but usually all locales are allocated at the beginning of the function.

However, the cost of allocating local variables is very small, because they are pushed onto the stack (or pushed into the register after optimization).

+11
Jan 01 '15 at 10:32
source share

Keep your ad as close as possible to where it is used. Ideal inside nested blocks. Therefore, in this case, it makes no sense to declare variables above the if .

+6
Jan 01 '15 at 10:28
source share

The best practice is to adapt the lazy approach, i.e. their declarations only when you really need them;) (and not earlier). This leads to the following advantage:

The code is more readable if these variables are declared as close as possible to the place of use.

+5
Jan 01 '15 at 10:30
source share

If you have this

 int function () { { sometype foo; bool somecondition; /* do something with foo and compute somecondition */ if (!somecondition) return false; } internalStructure *str1; internalStructure *str2; char *dataPointer; float xyz; /* do something here with the above local variables */ } 

then the stack space reserved for foo and somecondition can be explicitly reused for str1 , etc., so by declaring after if , you can save the stack space. Depending on the compiler's optimization options, saving stack space may also occur if you smooth out fucntion by removing the inner pair of curly braces or if you declare str1 , etc. Before if ; however, this requires the compiler / optimizer to notice that the areas do not overlap. By putting declarations after if , you facilitate this behavior even without optimization - not to mention improved code readability.

+5
Jan 01 '15 at 15:07
source share

I prefer to keep the “early” condition at the top of the function, in addition to documenting why we do this. If we put it after a bunch of variable declarations, someone unfamiliar with the code can easily skip it, unless they know they need to look for it.

Documenting the status of the "early exit" is not always sufficient, it is better to clarify it in the code. Setting an early condition at the top also makes it easier to synchronize the document with the code, for example, if we later decide to remove the earlier condition or add more such conditions.

+4
Jan 01 '15 at 10:33
source share

If that really mattered, the only way to avoid highlighting variables would probably be this:

 int function_unchecked(); int function () { if (!someGlobalValue) return false; return function_unchecked(); } int function_unchecked() { internalStructure *str1; internalStructure *str2; char *dataPointer; float xyz; /* do something here with the above local variables */ } 

But in practice, I think that you will not find performance benefits. If something is a minor overhead.

Of course, if you encoded C ++, and some of these local variables had non-trivial constructors, you probably would need to place them after checking. But even then, I don’t think it would help break the function.

+4
Jan 01 '15 at 10:49
source share

Whenever you highlight local variables in the C domain (such as functions), they do not have a default initialization code (for example, C ++ constructors). And since they are not dynamically allocated (these are just uninitialized pointers), additional (and potentially expensive) functions should not be called (e.g. malloc ) to prepare / highlight them.

Due to the stack working, highlighting the stack variable simply means decreasing the stack pointer (i.e. increasing the size of the stack because it grows down on most architectures) to make room for it. From the CPU point of view, this means executing a simple SUB instruction: SUB rsp, 4 (in case your variable is 4 bytes in size - for example, a 32-bit integer).

In addition, when you declare several variables, your compiler is smart enough to actually group them together into one big SUB rsp, XX instruction, where XX is the total size of the local variables in the area. In theory. In practice, something else happens.

In such situations, I find GCC explorer an invaluable tool when it comes to figuring out (with great ease) what happens “under the hood” of the compiler.

So, let's see what happens when you actually write a function like this: GCC Explorer Link .

C code

 int function(int a, int b) { int x, y, z, t; if(a == 2) { return 15; } x = 1; y = 2; z = 3; t = 4; return x + y + z + t + a + b; } 

Resulting assembly

 function(int, int): push rbp mov rbp, rsp mov DWORD PTR [rbp-20], edi mov DWORD PTR [rbp-24], esi cmp DWORD PTR [rbp-20], 2 jne .L2 mov eax, 15 jmp .L3 .L2: -- snip -- .L3: pop rbp ret 

As it turned out, GCC is even smarter than that. It does not execute the SUB command at all to distribute local variables. It simply (internally) assumes that the space is “occupied”, but does not add any instructions for updating the stack pointer (for example, SUB rsp, XX ). This means that the stack pointer is not updated, but since in this case the PUSH instructions (and no rsp -relative lookups) are rsp after using the stack space, there is no problem.

Here's an example where no additional variables are declared: http://goo.gl/3TV4hE

C code

 int function(int a, int b) { if(a == 2) { return 15; } return a + b; } 

Resulting assembly

 function(int, int): push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi cmp DWORD PTR [rbp-4], 2 jne .L2 mov eax, 15 jmp .L3 .L2: mov edx, DWORD PTR [rbp-4] mov eax, DWORD PTR [rbp-8] add eax, edx .L3: pop rbp ret 

If you look at the code before returning prematurely ( jmp .L3 , which switches to the clear and return code), additional instructions are not needed to "prepare" the stack variables. The only difference is that the parameters of the function a and b, which are stored in the edi and esi registers, are loaded edi stack with a higher address than in the first example ( [rbp-4] and [rbp - 8] ). This is due to the fact that additional space was not "allocated" for local variables, as in the first example. So, as you can see, the only “overhead" for adding these local variables is to change the subtracted term (i.e. not even add the extra subtraction operation).

Thus, in your case, there is practically no cost to simply declaring stack variables.

+4
Jan 27 '15 at 9:11
source share

If you declare variables after the if statement and return from the function immediately, the compiler does not use commit memory on the stack.

+1
Jan 08 '15 at 10:53 on
source share



All Articles