I think that the function should still make one copy (see the end for what I consider the most optimal valid version). The remaining (more or less understandable) failures of optimization.
SysV x86-64 ABI does not guarantee that a function will not modify its stack-args arguments. He does not say anything about const . All that he does not guarantee cannot be accepted. He simply says that large objects passed by value go on the stack; nothing about the state when the called function returns. The caller "owns" its arguments, even if they are declared const . See Also x86 , but the ABI doc itself is the only wiki link that is really relevant.
Similarly, narrow integer types can be in registers with garbage in high bits as arguments or return values. ABI clearly does not say anything in any way, so there is no guarantee that high bits will be reset. This is actually what gcc does: it assumes that there is a lot of garbage on receiving the values ββand a lot of garbage will be left on passing the values. The same goes for float / double in xmm regs. I have confirmed this with one of the ABI developers recently, by studying some unsafe codes generated by clang. Therefore, I am sure that the correct interpretation is that you should not assume that the ABI is not explicitly guaranteed.
gcc does not do this, but I believe it would be legal not to make a copy for a function called like this:
void modifyconstarg(const struct sx) {
Instead, just save it in arg and jmp pFconstval .
My guess is that missed optimizations, not gcc and clang, are conservative in their interpretation of the standard.
It seems that gcc and clang do not do much work to optimize copies for objects too large to fit in a register. The source code, which did not copy them in the first place, would be even better than the best work the compiler could handle (for example, pass by const * or C ++ const-reference), since I do not think that your The proposed optimization is legal.
gcc and clang do much worse than better legal optimization: see the output on godbolt .
Strange: with -march=haswell (or any other Intel processor) gcc issues a memcpy function call instead of rep movsq inline code. I do not understand. He does this even with -ffreestanding / -nostdlib
IDK, if someone else thought rdi was a pointer to memory, i.e. that it was transmitted using an invisible link. It took me a long time to fully understand that call functions by value do not accept any parameters in registers at all. I kept thinking that it was strange that rep movsq left rdi , indicating a high copy.
You do not need function pointers to reproduce this; normal prototype functions (and more descriptive names) still demonstrate it.
struct s { int _[50]; };
The gcc output for modifyarg is fun:
lea rdi, [rsp+8] mov DWORD PTR [rsp+48], 10 mov ecx, 25 mov rsi, rdi ; src=dest rep movsq ; in-place "copy" jmp pFconstval
It copies even if you do not change x . Clang makes the actual copy elsewhere before the tail of the jmp .
The best legal version of your function
as I understand ABI:
sub rsp, 416 mov rdi, rsp call pFref ; or call [pF2] if using function pointers. Is your disassembly in MASM syntax? lea rdi, [rsp+208] ; aligned by 16 for hopefully better rep movsq perf ; and so the stack is aligned by 16 at each location mov rsi, rsp mov ecx, 25 rep movsq call pFconstval ; clobbering the low copy add rsp, 208 call pFval ; clobbering the remaining high copy add rsp, 208 ret
BTW, gcc using rbx stupid. It stores four bytes of code:
push / pop : 2 bytes. mov rbx, rsp : 3B. 2x mov rsi, rbx : 2x3B. Total = 12V
replacing all of this with 2x lea rsi, [rsp+208] : 2x 8B. Total = 16B.
He does not avoid additional stack synchronization, as mov rdi, rsp . 4B code is not worth spending 3 times. In my version, which only copies once (and it needs only one LEA), this is also a loss in bytes of code.