Compiling this function:
int f(int a, int b) { return a * b; }
With gcc -O3 -march=native -m64 -fomit-frame-pointer -S gives me the following assembly:
f: movl %ecx, %eax imull %edx, %eax ret
The first command ( movl ) loads the first argument, the second command ( imull ) loads the second argument and multiplies it by the first - then the result is returned.
The actual multiplication is done using imull , which - depending on your type of processor - will take a certain number of processor cycles.
If you look at the Agner Fog instruction synchronization tables , you can see how long each instruction will take. On most x86 processors this seems like a small constant, however, the imul instruction on AMD K8 with a 64-bit argument and the result is displayed as 4-5 CPU cycles. However, I do not know if there was a measurement problem or really a variable time.
Also note that there are other factors, not just runtime. The integer must be moved through the processor and get to the right place to get the multiplication. All this and other factors delay, which is also noted in the Agner Fog tables. There are other problems, such as cache problems, which also make life difficult - it's not so simple to say how quickly something will work without starting it.
x86 is not the only architecture, and in fact it is unthinkable, there are CPUs and architectures that have a variable time multiplication. This is especially important for cryptography, where algorithms using multiplication can be prone to temporary attacks on these platforms.
orlp
source share