Some clang assembly not working in real mode (.COM, small memory model)

Firstly, this is a kind of continuation of Custom memory allocator for DOS.COM real mode (freestanding) - how to debug? . But for it to be self-sufficient, here is the background:

clang (and gcc too) has the -m16 switch, so long i386 instruction set instructions are prefixed for 16-bit execution in real mode. This can be used to create DOS .COM 32bit-realmode-executables using the GNU linker, as described in this blog post . (Of course, it is still limited by a small memory model, it means everything in one segment is 64 KB). Wanting to play with this, I created a minimal runtime that seems to work pretty well.

Then I tried to create my newly created damn game with this version, and, well, it crashed. The first thing I came across was the classic Heisenbug: printing the wrong wrong value made it right. I found a workaround to meet the next crash. So the first thing I’m to blame for this is my usual implementation of malloc() , see Another question. But until no one noticed something was really wrong with this, I decided to take a look at my Heisenbug again. This appears in the following code snippet (note that this worked flawlessly when compiling for other platforms):

 typedef struct { Item it; /* this is an enum value ... */ Food *f; /* and this is an opaque pointer */ } Slot; typedef struct board { Screen *screen; int w, h; Slot slots[1]; /* 1 element for C89 compatibility */ } Board; [... *snip* ...] size = sizeof(Board) + (size_t)(w*h-1) * sizeof(Slot); self = malloc(size); memset(self, 0, size); 

sizeof(Slot) is 8 (with clang and i386 architecture), sizeof(Board) is 20, and w and h are the sizes of the playing field, in case of work in DOS 80 and 24 (because one line is reserved for the title / status bar) . To debug what is going on here, I made my malloc() output of my parameter, and it was called with a value of 12 ( sizeof(board) + (-1) * sizeof(Slot) ?)

The printout of w and h showed the correct values, another malloc() got 12. The printout of size showed the correctly calculated size, and this time the value of malloc() got the correct value. So classic heisenbug.

The workaround I found is as follows:

  size = sizeof(Board); for (int i = 0; i < w*h-1; ++i) size += sizeof(Slot); 

Strange, it worked. The next logical step: compare the generated assembly. Here I have to admit that I'm completely new to x86 , my only build experience was with the good old 6502 . So, in the following snippets, I will add my assumptions and thoughts as comments, please correct me here.

First, the "broken" original version ( w , h is in %esi , %edi ):

  movl %esi, %eax imull %edi, %eax # ok, calculate the product w*h leal 12(,%eax,8), %eax # multiply by 8 (sizeof(Slot)) and add # 12 as an offset. Looks good because # 12 = sizeof(Board) - sizeof(Slot)... movzwl %ax, %ebp # just use 16bit because my size_t for # realmode is "unsigned short" movl %ebp, (%esp) calll malloc 

Now it looks good to me, but my malloc() sees 12, as already mentioned. The loop workaround is compiled into the following assembly:

  movl %edi, %ecx imull %esi, %ecx # ok, w*h again. leal -1(%ecx), %edx # edx = ecx-1? loop-end condition? movw $20, %ax # sizeof(Board) testl %edx, %edx # I guess that sets just some flags in # order to check whether (w*h-1) is <= 0? jle .LBB0_5 leal 65548(,%ecx,8), %eax # This seems to be the loop body # condensed to a single instruction. # 65548 = 65536 (0x10000) + 12. So # there is our offset of 12 again (for # 16bit). The rest is the same ... .LBB0_5: movzwl %ax, %ebp # use bottom 16 bits movl %ebp, (%esp) calll malloc 

As described above, this second option works as expected. My question after all this long text is as simple as ... WHY? Is there anything special in the real moment that I am not here?

For reference: this commit contains both versions of the code. Just type make -f libdos.mk for the workaround version (crashing later). To compile the error code, first remove -DDOSREAL from CFLAGS in libdos.mk .

Update: given the comments, I tried to debug this myself a little deeper. Using the dosbox debugger is a bit cumbersome, but I finally got it to break the position of this error. So, the following assembly code designed by clang :

  movl %esi, %eax imull %edi, %eax leal 12(,%eax,8), %eax movzwl %ax, %ebp movl %ebp, (%esp) calll malloc 

ends up like this (note the intel syntax used by the dosbox disassembler):

 0193:2839 6689F0 mov eax,esi 0193:283C 660FAFC7 imul eax,edi 0193:2840 668D060C00 lea eax,[000C] ds:[000C]=0000F000 0193:2845 660FB7E8 movzx ebp,ax 0193:2849 6766892C24 mov [esp],ebp ss:[FFB2]=00007B5C 0193:284E 66E8401D0000 call 4594 ($+1d40) 

I think this lea instruction looks suspicious, and indeed, after it, the wrong value is in ax . So, I tried to pass the same assembly source to the GNU assembler using .code16 with the following result (disassembling objdump , I think this is not entirely correct, because it might misinterpret the size prefix bytes):

 00000000 <.text>: 0: 66 89 f0 mov %si,%ax 3: 66 0f af c7 imul %di,%ax 7: 67 66 8d 04 lea (%si),%ax b: c5 0c 00 lds (%eax,%eax,1),%ecx e: 00 00 add %al,(%eax) 10: 66 0f b7 e8 movzww %ax,%bp 14: 67 66 89 2c mov %bp,(%si) 

The only difference is the lea instruction. Here it starts with 67 , which means “32-bit address” in real-time 16-bit mode. I guess this is really necessary because lea designed to work by address and just “misuses” the optimizer to calculate the data here. Are my assumptions correct? If so, could this be a bug in the clang internal assembler for -m16 ? Maybe someone can explain where the 668D060C00 emitted by clang comes from, and what could be the meaning? 66 means that “32 bit data” and 8D is probably the 8D itself --- but what about the rest?

+4
source share
1 answer

Your objdump output is bogus. It looks like he understands the assumption of a 32-bit address and the size of the operands, not 16. Thus, he thinks lea ends earlier than he does, and parses some of the address bytes into lds / add . And then miraculously returns to synchronism and sees movzww that the zero extends from 16b to 16b ... Pretty funny.

I tend to trust your DOSBOX disassembly. This perfectly explains your observed behavior (malloc is always called with arg from 12). You are right that the culprit

 lea eax,[000C] ; eax = 0x0C = 12. Intel/MASM/NASM syntax leal 12, %eax #or AT&T syntax: 

It seems like an error in compiling your binary DOSBOX ( clang -m16 I think you said) since it compiled leal 12(,%eax,8), %eax into this.

 leal 12(,%eax,8), %eax # AT&T lea eax, [12 + eax*8] ; Intel/MASM/NASM syntax 

I could possibly break through some instruction coding tables / documents and find out exactly how this lea should have been compiled into machine code. It must match 32-bit encoding, but with 67 66 prefixes (address size and operand size, respectively). (And no, the order of these prefixes doesn't matter; 66 67 will work too.)

Your DOSBOX and objdump pins don't even have the same binary, so yes, they came out differently. (objdump incorrectly interprets the operand size prefix in previous instructions, but this does not affect the length of the insn before LEA.)

Your GNU as .code16 has 67 66 8D 04 C5 , then 32bit 0x0000000C offset (little-endian). This is lea with both prefixes. I assume that the correct encoding is leal 12(,%eax,8), %eax for 16-bit mode.

DOSBOX disassembly has only 66 8D 06 , with an absolute address of 16 bits 0x0C . (There is no 32-bit address size prefix and is used in a different addressing mode.) I am not an x86 binary expert; I had no problems with disassemblers / instructions before. (And I usually look only at 64-bit asm.) So I have to look for encodings for different addressing modes.

My source code for x86 instructions is Intel Intel® 64 and IA-32 Architecture Software Developer's Guide Volume 2 (2A, 2B, and 2C): Instruction Manual Reference, AZ . (related to fooobar.com/tags/x86 / ... , BTW.)

It says: (section 2.1.1)

The operand size override prefix allows the program to switch between 16- and 32-bit operands. Any size can be the default; Using the prefix selects a custom size.

Thus, everything is just the same as regular 32-bit protected mode, except for the 16-bit default operand size.

There is a table in the description of insn lea that describes what happens with various combinations of 16, 32 and 64-bit addresses (67H prefix) and operand sizes (66H prefix). In all cases, it truncates or zero expansion leads to the fact that if the size does not match, but this is an Intel insn ref manual, so it should lay out each case separately. (This is useful for more complex team behavior.)

And yes, the “abuse” of lea , using it for non-address data, is a common and useful optimization. You can make a non-destructive addition of 2 registers by putting the result in 3rd. And at the same time add a constant and scale one of the inputs to 2, 4 or 8. Thus, it can do what takes up to 4 other instructions. ( mov / shl / add r,r / add r,i ). In addition, it does not affect the flags, which is a bonus if you want to save flags for another transition or especially cmov .

+3
source

All Articles