Creating assemblies for the x86 processor

I'm currently working on implementing the Andrew Appel Modern Java adapter in Java, and I'm right where I am creating the low-level intermediate view.

At first I decided to target the JVM and ignore all the low-level machine stuff, but in the interest of learning that I know little about how my heart has changed. This changes my IR, because JVM targeting allows me (more or less) to wave my arms when calling a method or creating an object.

There is no detailed information on any particular machine architecture in the Appel book, so I would like to know where I can find out everything I need to know in order to move on.

What I now know that I need to know is:

  • What set of commands to use. I have two laptops on which I could develop; both have Core 2 Duo processors. My real understanding is that x86 processors mostly use the same instruction set, but they are not all the same.

  • Regardless of whether the operating system affects the stage of compilation code generation or is completely processor dependent. For example, I know that there is something else in the generation of code to work on a 32-bit and 64-bit platform.

  • How stack frames are organized, etc. When to use registers or to put parameters on a stack, cause-call-save against callee-save, all this. I would have thought that this would be described along with a set of commands, but so far I have not seen this specific information anywhere. Maybe I donโ€™t understand something?

Links to resources instead of answers are welcome.

+7
assembly compiler-construction x86 code-generation
source share
3 answers

Most x86 instruction sets are common to all processors - itโ€™s a fairly safe bet that your processors have the same instruction set, except perhaps for SIMD instructions, which probably wonโ€™t be very useful for you when implementing a simple compiler (These instructions are usually used to make multimedia applications, etc. faster). The instruction set is provided in the Intel - 2A and 2B manuals , in particular, they contain a complete list of instructions and their behavior, although other volumes are worth a look at.

When generating user space code, choosing an operating system matters when it comes to system calls. For example, if you want a program to output something to a terminal on 64-bit Linux, you need to make a system call:

  • Load the value 1 into the rax register to indicate that this is a write system call.
  • load the value 1 into the rdi register to indicate that stdout should be used (1 - file descriptor for stdout)
  • Loading the start address of what you want to print into the rsi register
  • loading the length of what you want to print into the rdx register
  • executing the syscall after setting registers (and memory).

The return value from write stored in rax .

Another operating system may have a different system call number for write , there may be a different way of passing arguments (x86-64 Linux system calls always use rdi , rsi , rdx , r10 , r8 and r9 in this order for parameters, with the system call number in rax ) and generally can have different system calls.

The convention for regular function calls on Linux is similar - the order of the rdi , rsi , rdx , rcx , r8 and r9 (so anyway, except for using rcx instead of r10 ), with additional arguments on the stack and a return value in rax . According to this page , the registers rbp , rbx and r12 to r15 must be stored in all function calls. Of course, you can make your own agreement (if you donโ€™t make a system call), but this makes it difficult to call from code created or written by others.

+5
source share

Like stack frames, etc. organized. When to use registers against setting parameters on the stack, caller-save and callee-save are all that. I would have thought that this is described along with a set of instructions, but so far I have not seen this specific information anywhere. Maybe I'm misunderstanding something here?

In general, there are no correct answers to these questions. You can use any calling conventions you want ... if you do not want to interact with other people's code. For interoperability, compilers standardize on Application Binary Interfaces. I understand that Itanium C ++ ABI has become a popular standard in recent years. Try to start there.

+3
source share

I can not answer all your questions; but

  • The core x86 instruction set is compatible with the x86 family of processors. You are not planning to implement any specific extensions, are you?
  • I donโ€™t think your OS or architecture is very important for code generation
  • The default answer for everything related to the compiler is the Dragon Book . Have you watched it yet?
+1
source share

All Articles