Firstly, this is a kind of continuation of Custom memory allocator for DOS.COM real mode (freestanding) - how to debug? . But for it to be self-sufficient, here is the background:
clang (and gcc too) has the -m16 switch, so long i386 instruction set instructions are prefixed for 16-bit execution in real mode. This can be used to create DOS .COM 32bit-realmode-executables using the GNU linker, as described in this blog post . (Of course, it is still limited by a small memory model, it means everything in one segment is 64 KB). Wanting to play with this, I created a minimal runtime that seems to work pretty well.
Then I tried to create my newly created damn game with this version, and, well, it crashed. The first thing I came across was the classic Heisenbug: printing the wrong wrong value made it right. I found a workaround to meet the next crash. So the first thing I’m to blame for this is my usual implementation of malloc() , see Another question. But until no one noticed something was really wrong with this, I decided to take a look at my Heisenbug again. This appears in the following code snippet (note that this worked flawlessly when compiling for other platforms):
typedef struct { Item it; Food *f; } Slot; typedef struct board { Screen *screen; int w, h; Slot slots[1]; } Board; [... *snip* ...] size = sizeof(Board) + (size_t)(w*h-1) * sizeof(Slot); self = malloc(size); memset(self, 0, size);
sizeof(Slot) is 8 (with clang and i386 architecture), sizeof(Board) is 20, and w and h are the sizes of the playing field, in case of work in DOS 80 and 24 (because one line is reserved for the title / status bar) . To debug what is going on here, I made my malloc() output of my parameter, and it was called with a value of 12 ( sizeof(board) + (-1) * sizeof(Slot) ?)
The printout of w and h showed the correct values, another malloc() got 12. The printout of size showed the correctly calculated size, and this time the value of malloc() got the correct value. So classic heisenbug.
The workaround I found is as follows:
size = sizeof(Board); for (int i = 0; i < w*h-1; ++i) size += sizeof(Slot);
Strange, it worked. The next logical step: compare the generated assembly. Here I have to admit that I'm completely new to x86 , my only build experience was with the good old 6502 . So, in the following snippets, I will add my assumptions and thoughts as comments, please correct me here.
First, the "broken" original version ( w , h is in %esi , %edi ):
movl %esi, %eax imull %edi, %eax
Now it looks good to me, but my malloc() sees 12, as already mentioned. The loop workaround is compiled into the following assembly:
movl %edi, %ecx imull %esi, %ecx
As described above, this second option works as expected. My question after all this long text is as simple as ... WHY? Is there anything special in the real moment that I am not here?
For reference: this commit contains both versions of the code. Just type make -f libdos.mk for the workaround version (crashing later). To compile the error code, first remove -DDOSREAL from CFLAGS in libdos.mk .
Update: given the comments, I tried to debug this myself a little deeper. Using the dosbox debugger is a bit cumbersome, but I finally got it to break the position of this error. So, the following assembly code designed by clang :
movl %esi, %eax imull %edi, %eax leal 12(,%eax,8), %eax movzwl %ax, %ebp movl %ebp, (%esp) calll malloc
ends up like this (note the intel syntax used by the dosbox disassembler):
0193:2839 6689F0 mov eax,esi 0193:283C 660FAFC7 imul eax,edi 0193:2840 668D060C00 lea eax,[000C] ds:[000C]=0000F000 0193:2845 660FB7E8 movzx ebp,ax 0193:2849 6766892C24 mov [esp],ebp ss:[FFB2]=00007B5C 0193:284E 66E8401D0000 call 4594 ($+1d40)
I think this lea instruction looks suspicious, and indeed, after it, the wrong value is in ax . So, I tried to pass the same assembly source to the GNU assembler using .code16 with the following result (disassembling objdump , I think this is not entirely correct, because it might misinterpret the size prefix bytes):
00000000 <.text>: 0: 66 89 f0 mov %si,%ax 3: 66 0f af c7 imul %di,%ax 7: 67 66 8d 04 lea (%si),%ax b: c5 0c 00 lds (%eax,%eax,1),%ecx e: 00 00 add %al,(%eax) 10: 66 0f b7 e8 movzww %ax,%bp 14: 67 66 89 2c mov %bp,(%si)
The only difference is the lea instruction. Here it starts with 67 , which means “32-bit address” in real-time 16-bit mode. I guess this is really necessary because lea designed to work by address and just “misuses” the optimizer to calculate the data here. Are my assumptions correct? If so, could this be a bug in the clang internal assembler for -m16 ? Maybe someone can explain where the 668D060C00 emitted by clang comes from, and what could be the meaning? 66 means that “32 bit data” and 8D is probably the 8D itself --- but what about the rest?