Does keyword limitation provide significant benefits in gcc / g ++?
It can reduce the number of instructions, as shown in the example below, so use it whenever possible.
GCC 4.8 Linux x86-64 exmample
Input:
void f(int *a, int *b, int *x) { *a += *x; *b += *x; } void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; *b += *x; }
Compile and decompile:
gcc -g -std=c99 -O0 -c main.c objdump -S main.o
With -O0 they coincide.
C- -O3 :
void f(int *a, int *b, int *x) { *a += *x; 0: 8b 02 mov (%rdx),%eax 2: 01 07 add %eax,(%rdi) *b += *x; 4: 8b 02 mov (%rdx),%eax 6: 01 06 add %eax,(%rsi) void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; 10: 8b 02 mov (%rdx),%eax 12: 01 07 add %eax,(%rdi) *b += *x; 14: 01 06 add %eax,(%rsi)
For the uninitiated, the calling convention :
rdi = first parameterrsi = second parameterrdx = third parameter
Conclusion: 3 teams instead of 4 .
Of course, the instructions may have different delays , but this gives a good idea.
Why was the GCC able to optimize this?
The above code was taken from Wikipedia Example , which is very illuminating.
Pseudo assembly for f :
load R1 โ *x ; Load the value of x pointer load R2 โ *a ; Load the value of a pointer add R2 += R1 ; Perform Addition set R2 โ *a ; Update the value of a pointer ; Similarly for b, note that x is loaded twice, ; because a may be equal to x. load R1 โ *x load R2 โ *b add R2 += R1 set R2 โ *b
For fr :
load R1 โ *x load R2 โ *a add R2 += R1 set R2 โ *a ; Note that x is not reloaded, ; because the compiler knows it is unchanged ; load R1 โ *x load R2 โ *b add R2 += R1 set R2 โ *b
Is this really faster?
Ermmm ... not for this simple test:
.text .global _start _start: mov $0x10000000, %rbx mov $x, %rdx mov $x, %rdi mov $x, %rsi loop:
And then:
as -o ao aS && ld ao && time ./a.out
on Ubuntu 14.04 AMD64 Intel i5-3210M processor.
I admit that I still do not understand modern processors. Let me know if you:
- found a flaw in my method
- found a test case for assembler where it gets a lot faster
- understand why there was no difference.