How does assembler work?

I am looking for a brief description of using assembler when creating machine code.

So, I know that assembly is a translation of machine code by 1: 1. But I get confused with the object code and linkers and how they fit into it.

I don't need a complicated answer, a simple simple one will do

+7
source share
2 answers

Both assembler and compiler convert source files to object files.

Object files are actually an intermediate step before the final executable (generated linker) output.

The component accepts the specified object files and libraries (which are packages of object files) and resolves the records (or "fix").

These move records are executed when the compiler / assembler does not know the address of the function or variable used in the source code and generates a link for it by name, which can be resolved by the linker.

For example, say that you want the program to print a message on the screen, split into two source files, and you want to assemble them separately and link them (for example, using Linux x86-64 system calls) -

main.asm:

bits 64 section .text extern do_message global _start _start: call do_message mov rax, 1 int 0x80 

message.asm:

 bits 64 section .text global do_message do_message: mov rdi, message mov rcx, dword -1 xor rax, rax repnz scasb sub rdi, message mov rax, 4 mov rbx, 1 mov rcx, message mov rdx, rdi int 0x80 ret section .data message: db "hello world",10,0 

If you collect them and look at the output of the main.asm object file (for example, objdump -d main.o), you will notice that "call do_message" has the address 00 00 00 00 - this is incorrect.

 0000000000000000 <_start>: 0: e8 00 00 00 00 callq 5 <_start+0x5> 5: 48 c7 c0 01 00 00 00 mov $0x1,%rax c: cd 80 int $0x80 

But the record of the move record is done for 4 bytes of the address:

 $ objdump -r main.o main.o: file format elf64-x86-64 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000001 R_X86_64_PC32 do_message+0xfffffffffffffffc 000000000000000d R_X86_64_32 .data 

The offset is "1" and the type is "R_X86_64_PC32", which tells the linker to allow this link and put the allowed address at the specified offset.

When you link the final program to the 'ld -o program main.o message.o', all relocations are allowed, and if nothing is allowed, you are left with the executable.

When we have the "objdump -d" executable, we can see the allowed address:

 00000000004000f0 <_start>: 4000f0: e8 0b 00 00 00 callq 400100 <do_message> 4000f5: 48 c7 c0 01 00 00 00 mov $0x1,%rax 4000fc: cd 80 int $0x80 

The same move is used for both variables and functions. The same process happens when you link your program to several large libraries, such as libc - you define the "main" function to which libc has an external link - then libc starts before your program and calls your "main" function when you run the executable file.

+11
source

A simple explanation:

Once the assembler language is compiled into object code, the linker is used to convert the object code into an executable file of commands that the computer can understand and run. The generated machine code can be interpreted by the cpu controller.

+1
source

All Articles