Does the argument parameter for a C calling convention ever have a performance impact?

For example, will there be functions:

void foo(float*,float*,int,float); void foo(float*,float,float*,int); 

have the same or different overhead?

Edit: I am not asking about how the compiler will optimize things. I specifically set up a cdecl calling convention regarding how the overhead will differ for different ABIs.

+7
optimization c
source share
2 answers

Traditional calling conventions almost always allocate the parameter space on the stack, and there is always overhead associated with copying arguments to this space.

Assuming a highly volatile environment, the only additional overhead that could potentially exist could be due to memory alignment problems. In this example, the parameters will be in continuous memory, so there will be no addition to alignment properly.

In the case of parameters with types of different sizes, the parameters in the following declaration:

 int func (int a, char c, int b) 

will be indented between them, while those indicated in this ad:

 int func (int a, int b, char c) 

will not.

The stack frame for the first may look like this:

 | local vars... | low memory +---------------+ - frame pointer | a | a | a | a | | c | X | X | X | | b | b | b | b | +---------------+ high memory 

And for the last:

 | local vars... | low memory +---------------+ - frame pointer | a | a | a | a | | b | b | b | b | | c | X | X | X | +---------------+ high memory 

When the function is called, the arguments are written on the stack in the order they appear, so for the first you write 4 bytes int a , 1 byte char c , then you need to skip these 3 bytes to write 4 bytes int b .

In the latter case, you will write to adjacent memory cells and should not take into account gaps due to filling.

In a volatile environment, we are talking about a performance difference of the order of a few nanoseconds for passes. A decrease in performance can be detected, but almost insignificantly.

(By the way, how slippage is completely architecture dependent ... but I would say that itโ€™s just a higher offset to fill in the next address. Iโ€™m not quite sure how this can be done differently in different architectures).

Of course, in a non-volatile environment, when we use CPU caching, the performance hit drops to fractions of a nanosecond. We would risk discovering uncertainty, and therefore the difference does not actually exist.

Filling data is really just space cost. When you work in embedded systems, you want to order your parameters from the largest to the smallest in order to reduce (and sometimes eliminate) the addition.

So, as far as I can tell (without additional information, such as the exact speed of data transfer between the memory on a particular machine or architecture), there should not be performance for different orders of parameters.

+3
source share

Of course, this detail is platform / ABI dependent.

For example, there shouldnโ€™t be a difference with x86-64, since these few parameters will be passed simply in registers, and the use of registers is almost symmetrical (therefore, it does not matter which register you want to use).

With a large number of parameters, there will be a certain number of stacks, in which case it may matter depending on how the spilled parameters are used in the function body.

For example, if they are needed as a counter for a loop, then they can be used directly from the stack, if instead they are pointers, then to dereference the specified value, they must first be moved to the register.

Please note that, of course, what exactly happens before the compiler (even with two parameters) ... so it is possible that there are differences; what is impossible uses a specific order to get a better result in general (i.e., regardless of the compiler).

+3
source share

All Articles