Traditional calling conventions almost always allocate the parameter space on the stack, and there is always overhead associated with copying arguments to this space.
Assuming a highly volatile environment, the only additional overhead that could potentially exist could be due to memory alignment problems. In this example, the parameters will be in continuous memory, so there will be no addition to alignment properly.
In the case of parameters with types of different sizes, the parameters in the following declaration:
int func (int a, char c, int b)
will be indented between them, while those indicated in this ad:
int func (int a, int b, char c)
will not.
The stack frame for the first may look like this:
| local vars... | low memory +---------------+ - frame pointer | a | a | a | a | | c | X | X | X | | b | b | b | b | +---------------+ high memory
And for the last:
| local vars... | low memory +---------------+ - frame pointer | a | a | a | a | | b | b | b | b | | c | X | X | X | +---------------+ high memory
When the function is called, the arguments are written on the stack in the order they appear, so for the first you write 4 bytes int a , 1 byte char c , then you need to skip these 3 bytes to write 4 bytes int b .
In the latter case, you will write to adjacent memory cells and should not take into account gaps due to filling.
In a volatile environment, we are talking about a performance difference of the order of a few nanoseconds for passes. A decrease in performance can be detected, but almost insignificantly.
(By the way, how slippage is completely architecture dependent ... but I would say that itโs just a higher offset to fill in the next address. Iโm not quite sure how this can be done differently in different architectures).
Of course, in a non-volatile environment, when we use CPU caching, the performance hit drops to fractions of a nanosecond. We would risk discovering uncertainty, and therefore the difference does not actually exist.
Filling data is really just space cost. When you work in embedded systems, you want to order your parameters from the largest to the smallest in order to reduce (and sometimes eliminate) the addition.
So, as far as I can tell (without additional information, such as the exact speed of data transfer between the memory on a particular machine or architecture), there should not be performance for different orders of parameters.