In CDECL, arguments are pushed onto the stack in the reverse order, the caller clears the stack and the result is returned through the processor registry (I will call it โregister Aโ later). STDCALL has one difference: the caller does not clear the stack, but the caller.
You ask which one is faster. No one. Nobody. You must use the calling convention as long as possible. Change the agreement only if there is no way out when using external libraries that require the use of a specific agreement.
In addition, there are other conventions that the compiler can choose by default, i.e. Visual C ++ compiler uses FASTCALL, which is theoretically faster due to the wider use of processor registers.
Usually you should give the correct signature of the calling convention for callback functions passed to some external library, i.e. qsort callback from C library should be CDECL (if the compiler uses a different convention by default, then we should mark the callback as CDECL ) or various WinAPI callbacks must be STDCALL (all WinAPI is STDCALL).
Another common case may be when you save pointers to some external functions, i.e. To create a pointer to a WinAPI function, its type definition must be marked as STDCALL.
And below is an example showing how the compiler does it:
i = Function(x, y, z); int Function(int a, int b, int c) { return a + b + c; }
Cdecl:
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x' call (jump to function body, after function is finished it will jump back here, the address where to jump back is in registers) move contents of register A to 'i' variable pop all from the stack that we have pushed (copy of x, y and z) copy 'a' (from stack) to register A copy 'b' (from stack) to register B add A and B, store result in A copy 'c' (from stack) to register B add A and B, store result in A jump back to caller code (a, b and c still on the stack, the result is in register A)
STDCALL:
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x' call move contents of register A to 'i' variable pop 'a' from stack to register A pop 'b' from stack to register B add A and B, store result in A pop 'c' from stack to register B add A and B, store result in A jump back to caller code (a, b and c are no more on the stack, result in register A)