When I compile C code with a recent compiler on an amd64 or x86 system, the functions are aligned with a multiple of 16 bytes. How important is alignment for modern processors? Is there a huge performance penalty associated with calling an unbalanced function?
Benchmark
I ran the following microobject ( call.S ):
// benchmarking performance penalty of function alignment. #include <sys/syscall.h>
with the following shell script:
#!/bin/sh for i in `seq 0 15` ; do echo SKIP=$i cc -c -DSKIP=$i call.S ld -o call call.o time -p ./call done
On a processor that identifies itself as Intel (R) Core (TM) i7-2760QM CPU @ 2.40GHz according to /proc/cpuinfo . The offset did not affect me, the reference chart lasted 1.9 seconds.
On the other hand, in another system with a processor that communicates itself as an Intel i7 processor with an Intel (R) Core i7 processor with a frequency of 6.13 GHz, this breakpoint takes 6.3 seconds, unless you have an offset 14 or 15, where the code takes 7.2 seconds. I think that since the function starts spanning multiple lines of cache.
performance assembly x86-64 alignment
fuz
source share