Need to explain assembly instructions for the K & R fahr-to-cels example

Question

Need to explain assembly instructions for the K & R fahr-to-cels example

I am stuck in learning the basics of assembler with Fahrenheit Celsius from K & R. Here is the C code that I have in mind:

#include <stdio.h> main() { int fahr, celsius; int lower, upper, step; lower = 0; upper = 300; step = 20; fahr = lower; while (fahr <= upper) { celsius = 5 * (fahr-32) / 9; printf("%d\t%d\n", fahr, celsius); fahr = fahr + step; } }

Together with GCC 4.4.7 (GNU / Linux x86-64) I get the following disassembly:

 $ gcc -O0 -g -ansi -pedantic l1-2a.c $ gdb -q a.out (gdb) disas /m main (gdb) disas /m main Dump of assembler code for function main: 6 { 0x00000000004004c4 <+0>: push %rbp 0x00000000004004c5 <+1>: mov %rsp,%rbp 0x00000000004004c8 <+4>: sub $0x20,%rsp 7 int fahr, celsius; 8 int lower, upper, step; 9 10 lower = 0; 0x00000000004004cc <+8>: movl $0x0,-0xc(%rbp) 11 upper = 300; 0x00000000004004d3 <+15>: movl $0x12c,-0x8(%rbp) 12 step = 20; 0x00000000004004da <+22>: movl $0x14,-0x4(%rbp) 13 14 fahr = lower; 0x00000000004004e1 <+29>: mov -0xc(%rbp),%eax 0x00000000004004e4 <+32>: mov %eax,-0x14(%rbp) 15 while (fahr <= upper) { 0x00000000004004e7 <+35>: jmp 0x400532 <main+110> 0x0000000000400532 <+110>: mov -0x14(%rbp),%eax 0x0000000000400535 <+113>: cmp -0x8(%rbp),%eax 0x0000000000400538 <+116>: jle 0x4004e9 <main+37> 16 celsius = 5 * (fahr-32) / 9; 0x00000000004004e9 <+37>: mov -0x14(%rbp),%edx 0x00000000004004ec <+40>: mov %edx,%eax 0x00000000004004ee <+42>: shl $0x2,%eax 0x00000000004004f1 <+45>: add %edx,%eax 0x00000000004004f3 <+47>: lea -0xa0(%rax),%ecx 0x00000000004004f9 <+53>: mov $0x38e38e39,%edx 0x00000000004004fe <+58>: mov %ecx,%eax 0x0000000000400500 <+60>: imul %edx 0x0000000000400502 <+62>: sar %edx 0x0000000000400504 <+64>: mov %ecx,%eax 0x0000000000400506 <+66>: sar $0x1f,%eax 0x0000000000400509 <+69>: mov %edx,%ecx 0x000000000040050b <+71>: sub %eax,%ecx 0x000000000040050d <+73>: mov %ecx,%eax 0x000000000040050f <+75>: mov %eax,-0x10(%rbp) 17 printf("%d\t%d\n", fahr, celsius); 0x0000000000400512 <+78>: mov $0x400638,%eax 0x0000000000400517 <+83>: mov -0x10(%rbp),%edx 0x000000000040051a <+86>: mov -0x14(%rbp),%ecx 0x000000000040051d <+89>: mov %ecx,%esi 0x000000000040051f <+91>: mov %rax,%rdi 0x0000000000400522 <+94>: mov $0x0,%eax 0x0000000000400527 <+99>: callq 0x4003b8 < printf@plt > 18 fahr = fahr + step; 0x000000000040052c <+104>: mov -0x4(%rbp),%eax 0x000000000040052f <+107>: add %eax,-0x14(%rbp) 19 } 20 } 0x000000000040053a <+118>: leaveq 0x000000000040053b <+119>: retq End of assembler dump.

This is not clear to me:

 16 celsius = 5 * (fahr-32) / 9; 0x00000000004004e9 <+37>: mov -0x14(%rbp),%edx 0x00000000004004ec <+40>: mov %edx,%eax 0x00000000004004ee <+42>: shl $0x2,%eax 0x00000000004004f1 <+45>: add %edx,%eax 0x00000000004004f3 <+47>: lea -0xa0(%rax),%ecx 0x00000000004004f9 <+53>: mov $0x38e38e39,%edx 0x00000000004004fe <+58>: mov %ecx,%eax 0x0000000000400500 <+60>: imul %edx 0x0000000000400502 <+62>: sar %edx 0x0000000000400504 <+64>: mov %ecx,%eax 0x0000000000400506 <+66>: sar $0x1f,%eax 0x0000000000400509 <+69>: mov %edx,%ecx 0x000000000040050b <+71>: sub %eax,%ecx 0x000000000040050d <+73>: mov %ecx,%eax 0x000000000040050f <+75>: mov %eax,-0x10(%rbp)

I mean, I understand everything:

 lea -0xa0(%rax),%ecx

as it subtracts register 160 from %eax , which contains 5*fahr , like:

 5 * (fahr-32) / 9 <=> (5*fahr - 5*32) / 9 <=> (5*fahr - 160) / 9

after %ecx (as well as full %rcx ), 5*fahr - 160 is stored. However, I don’t understand how it divides by 9. It seems that some tricks such as “multiply by the inverse” to avoid division, but I don’t understand how this works.

+7

c assembly x86

Grzegorz szpetkowski Dec 02 '14 at 10:48

source share

1 answer

Mints97 · Accepted Answer · 2015-04-19T10:19:52+0000

To summarize what was said in the comments: 0x38e38e39 - 954437177 in decimal value, which is equal to (2^33 + 1) / 9 . So, the build code works this way (I replaced (5 * fahr - 160) with X for replacement):

 mov $0x38e38e39,%edx /* edx is now 0x38e38e39 == (2^33 + 1) / 9 */ mov %ecx,%eax /* eax is now X */ imul %edx /* edx:eax is now eax * edx == X * ((2^33 + 1) / 9) */

What the fun part begins there. edx:eax denotes the 1-operand of imul , which first fills its operand ( edx in this case) with 32 bits, and then puts the remaining least significant bits in eax .

Effectively we get a 64-bit result through two registers! It looks something like this:

edx - 32 least significant bits (X * ((2^33 + 1) / 9)) >> 32 .

eax is (X * ((2^33 + 1) / 9)) % 2^32 (but it will be dropped soon)

Then we get this substance in the form:

 sar %edx /* edx is now edx >> 1 == (X * ((2^33 + 1) / 9)) >> 33 */ mov %ecx,%eax /* eax is now X again */ sar $0x1f,%eax /* eax is now X >> 0x1f == X >> 31 */ mov %edx,%ecx /* ecx is now (X * ((2^33 + 1) / 9)) >> 33 */

So now ecx is the 32 least significant bits (X * ((2^33 + 1) / 9)) >> 33 and eax is X >> 31 , that is, the 32-digit bit "-s X (which is 32-bit integer) which are 0 if X non-negative and -1 if X negative.

EDIT: detailed study of the special case of negative X

Now a little about what happens to negative numbers. An important part of ecx is that it is actually the 32 most significant bits of X * ((2^33 + 1) / 9) .

As I hope you remember, in binary terms, negating a number means inverting all its bits and adding 1 to it. And when we add 1 , we invert lsb to 1 if it was 0 , otherwise we invert it and all the bits after it until we find the first 0 , and then invert it.

So, what happens when we try to deny (X * ((2^33 + 1) / 9)) (or, what is the same thing, what we get if we perform calculations with -X , considering X positive for this example )? Of course, first we invert all its bits, then add 1 to it. But for the latter (adding 1 to it), in order to affect the 32 most signifying bits of the number, the 32 least significant bits would have to be 0xFFFFFFFF . And (believe me, on this) there is no 32-bit integer which when multiplied by 0x38e38e39 gives such a result.

So efficient, while (-X * ((2^33 + 1) / 9)) == -(X * ((2^33 + 1) / 9)) it differs from the 32 most significant bits: ((-X * ((2^33 + 1) / 9)) >> 33) & 0xFFFFFFFF != -(((X * ((2^33 + 1) / 9)) >> 33) & 0xFFFFFFFF) .

Instead, the 32 most significant bits (-X * ((2^33 + 1) / 9)) are the bitwise negation of the 32 most significant bits (X * ((2^33 + 1) / 9)) : ((-X * ((2^33 + 1) / 9)) >> 33) & 0xFFFFFFFF != ~(((X * ((2^33 + 1) / 9)) >> 33) & 0xFFFFFFFF) .

Tl; dr for negative case X : the ecx value for -X will be equal to the bitwise negation of the ecx value for X We do not want this. So, to get the correct results for negative X values, we need to add 1 to ecx (or, which is the same thing, subtract -1 ):

 sub %eax,%ecx /* ecx is now X / 9 */

Then comes the last part:

 mov %ecx,%eax /* eax is now X / 9 */ mov %eax,-0x10(%rbp) /* Aaand mov the result into the variable "cels" */

I'm sorry if I messed up something, it's hard for me to keep my head around asm written in GAS syntax, but I hope you understand this idea.

Tl; dr: the trick here is multiplied by the inverse, multiplied by a large number, discarding the large number with an arithmetic shift, and then rounding the result to zero if it is negative.

Why is everyone worried?

As a result, we endowed the division into 10 cycles (given that imul only takes one cycle). Given that idiv can take almost twice as many cycles (11 to 18, as Hans Passant mentioned in this , answering a similar question), this approach can make a huge performance advantage.

Need to explain assembly instructions for the K & R fahr-to-cels example

More articles: