What is roaming and absolute machine code?

While learning assemblers, I came across these terms. The idea I got, such as in Relocatable code, the code does not depend on the place of static memory. The assembler determines the RAM requirements for my program. The memory can be placed wherever the linker finds a place for them.

Is the idea right? If so, how is assembler doing this?

And what are some examples of absolute machine code?

+8
assembly relocation
source share
5 answers

Many / most instruction sets have a relative pc address, which means the address of the program address, which is associated with the address of the executable command, and then adds an offset to it and uses it to access memory or branch or something like that. it will be what you call roaming. Because no matter where this instruction is in the address space, the thing you want to move is relative. Move the entire block of code and data to some other address, and they will still be relatively the same distance from each other, so relative addressing will still work. If equal to skip, the following instruction works wherever these three instructions are (skip, skipped and one after skip).

Absolute uses absolute addresses, jumps to this exact address, reads from this exact address. If the value is equal, then go to 0x1000.

Assembler does not do this, compiler and / or programmer. As a rule, in the end, the compiled code will have an absolute address, in particular if your code consists of separate objects that are connected to each other. At compile time, the compiler does not know where the object will be located, and cannot know where the external links are or how far, therefore, it cannot assume that they will be close enough for pc relative addressing (which usually has a range limit), therefore compilers often create a placeholder for the linker that populates the absolute address. It depends on the set of operations and commands and some other factors associated with solving this external address problem. In the end, although based on the size of the project, the linker will ultimately have some absolute addressing. Thus, non-default is usually a command line parameter for creating position-independent -PIC code, for example, it may be your compiler. both the compiler and the linker then have to do extra work to make these positions independent. The assembler programmer must do this on his own, the assembler does not participate at all in this, he simply creates machine code for the instructions that you tell him.

novectors.s:

.globl _start _start: b reset reset: mov sp,#0xD8000000 bl notmain ldr r0,=notmain blx r0 hang: b hang .globl dummy dummy: bx lr 

hello.c

 extern void dummy ( unsigned int ); int notmain ( void ) { unsigned int ra; for(ra=0;ra<1000;ra++) dummy(ra); return(0); } 

memap (script builder) MEMORY {RAM: ORIGIN = 0xD6000000, LENGTH = 0x4000} SECTIONS {.text: {(.text)}> ram} Makefile

 ARMGNU = arm-none-eabi COPS = -Wall -O2 -nostdlib -nostartfiles -ffreestanding all : hello_world.bin clean : rm -f *.o rm -f *.bin rm -f *.elf rm -f *.list novectors.o : novectors.s $(ARMGNU)-as novectors.s -o novectors.o hello.o : hello.c $(ARMGNU)-gcc $(COPS) -c hello.c -o hello.o hello_world.bin : memmap novectors.o hello.o $(ARMGNU)-ld novectors.o hello.o -T memmap -o hello_world.elf $(ARMGNU)-objdump -D hello_world.elf > hello_world.list $(ARMGNU)-objcopy hello_world.elf -O binary hello_world.bin 

hello_world.list (the parts we care about)

 Disassembly of section .text: d6000000 <_start>: d6000000: eaffffff b d6000004 <reset> d6000004 <reset>: d6000004: e3a0d336 mov sp, #-671088640 ; 0xd8000000 d6000008: eb000004 bl d6000020 <notmain> d600000c: e59f0008 ldr r0, [pc, #8] ; d600001c <dummy+0x4> d6000010: e12fff30 blx r0 d6000014 <hang>: d6000014: eafffffe b d6000014 <hang> d6000018 <dummy>: d6000018: e12fff1e bx lr d600001c: d6000020 strle r0, [r0], -r0, lsr #32 d6000020 <notmain>: d6000020: e92d4010 push {r4, lr} d6000024: e3a04000 mov r4, #0 d6000028: e1a00004 mov r0, r4 d600002c: e2844001 add r4, r4, #1 d6000030: ebfffff8 bl d6000018 <dummy> d6000034: e3540ffa cmp r4, #1000 ; 0x3e8 d6000038: 1afffffa bne d6000028 <notmain+0x8> d600003c: e3a00000 mov r0, #0 d6000040: e8bd4010 pop {r4, lr} d6000044: e12fff1e bx lr 

What I am showing here is a mixture of position-independent instructions and position-dependent instructions.

these two instructions, for example, are shortcuts that cause the assembler to add a style memory location in .word format, which the linker should then fill in for us.

 ldr r0,=notmain blx r0 

0xD600001c is the location.

  d600000c: e59f0008 ldr r0, [pc, #8] ; d600001c <dummy+0x4> d6000010: e12fff30 blx r0 ... d600001c: d6000020 strle r0, [r0], -r0, lsr #32 

and it is filled with the address 0xD6000020, which is an absolute address, so for this code to work, the notmain function must be at the address 0xD6000020, it does not move. but this part of the example also demonstrates some position-independent code,

 ldr r0, [pc, #8] 

- relative pc addressing. I talked about how this set of instructions works, during pc execution - two instructions in front, or basically in this case, if the instruction is in 0xD600000c in memory, then the computer will be 0xD6000014, then add 8 to it, as indicated in the instructions and you will get 0xD600001C. But if we moved this same machine code instruction to access 0x1000 and we move all the surrounding binaries there, including what it reads (0xD6000020). basically do this:

  1000: e59f0008 ldr r0, [pc, #8] 1004: e12fff30 blx r0 ... 1010: d6000020 

And these instructions that the machine code will still work, it does not need to be reassembled or re-linked. the code with the code 0xD6000020 should be on this fixed bit of the address ldr pc and blx dont.

Although the disassembler shows them with the addresses 0xd6 ..., bl and bne also refer to pc, which you can find out by looking at the documentation for the instruction set

 d6000030: ebfffff8 bl d6000018 <dummy> d6000034: e3540ffa cmp r4, #1000 ; 0x3e8 d6000038: 1afffffa bne d6000028 <notmain+0x8> 

0xD6000030 will have pc 0xD6000038 when executed and 0xD6000038-0xD6000018 = 0x20, which is 8 instructions. And the negative 8 in the double complement is 0xFFF..FFFF8, you can see that the main part of this machine code ebfffff8 is ffff8, which is an extension of the character and is added to the program counter to basically say that there are 8 instructions backward. The same goes for ffffa in 1afffffa, so if not equal, then put back 6 instructions. Remember that this set of commands (lever) assumes that the PC is two instructions forward, so back 6 means two forward and then back 6 or effectively back 4.

If you remove

 d600000c: e59f0008 ldr r0, [pc, #8] ; d600001c <dummy+0x4> d6000010: e12fff30 blx r0 

Then this whole program ends up being independent of position, by chance, if you do it (I accidentally found out that this will happen), but not because I told these tools, but simply because I did everything closely and didn't used no absolute addressing.

Finally, when you say “wherever the linker finds a place for them”, if you notice a script in my linker, I will tell the linker to start with 0xD6000000, I don’t specify any file names or functions, therefore, unless you say otherwise this the linker places the elements in the order in which they are specified on the command line. the hello.c code is the second, after the linker placed the novectors.s code, then wherever the linker takes place, immediately after that, the hello.c code starts with 0xD6000020.

And an easy way to see what is an independent provision and what isn't, without studying each instruction, is to change the script linker to put the code at some other address.

 MEMORY { ram : ORIGIN = 0x1000, LENGTH = 0x4000 } SECTIONS { .text : { *(.text*) } > ram } 

and see what machine code changes, if any, and what not.

 00001000 <_start>: 1000: eaffffff b 1004 <reset> 00001004 <reset>: 1004: e3a0d336 mov sp, #-671088640 ; 0xd8000000 1008: eb000004 bl 1020 <notmain> 100c: e59f0008 ldr r0, [pc, #8] ; 101c <dummy+0x4> 1010: e12fff30 blx r0 00001014 <hang>: 1014: eafffffe b 1014 <hang> 00001018 <dummy>: 1018: e12fff1e bx lr 101c: 00001020 andeq r1, r0, r0, lsr #32 00001020 <notmain>: 1020: e92d4010 push {r4, lr} 1024: e3a04000 mov r4, #0 1028: e1a00004 mov r0, r4 102c: e2844001 add r4, r4, #1 1030: ebfffff8 bl 1018 <dummy> 1034: e3540ffa cmp r4, #1000 ; 0x3e8 1038: 1afffffa bne 1028 <notmain+0x8> 103c: e3a00000 mov r0, #0 1040: e8bd4010 pop {r4, lr} 1044: e12fff1e bx lr 
+20
source share

Everything that actually contains an address inside the code has an absolute address. Programs that do not contain addresses in the code (everything is done with relative addresses) can be executed from any address.

Assembler does not do this, the programmer does it. I did a bit of work in the past, because for small things it is usually easy, once you go beyond the relative jump, it gets pretty painful. IIRC, the only two approaches are sliding relative jumps between routines or adding a known offset to the current address, pushing and returning. In the old days there was a third approach to its calculation and writing to code, but this is no longer acceptable. It was long enough that I will not swear, there are no other approaches.

IIRC the only way to “call” something without absolute addresses is to click the address you want to return to, calculate the address, click it and return.

Note that in practice, you usually use a hybrid approach. The assembler and linker store the information necessary for making adjustments when the program is loaded into memory, which it modified to run at any address to which it was downloaded. The actual image in memory is thus absolute, but the file on the disk works as if it were relative, but without all the headaches that are usually entered. (Note that the same approach is used with all higher-level languages ​​that actually create their own code.)

+5
source share

In principle, the “absolute” mode means that the code and RAM variables will be placed exactly where you specify the assembler, and the “movable” mode means that the assembler creates code fragments and sets the RAM requirements that can be placed wherever the linker finds a place for them.

+4
source share

I am not sure that the accepted answer is necessarily correct. There is a fundamental difference between Relocatable Code and what is considered post-independent code.

Now I was a coding assembly for a long time and on many different architectures, and I always thought of machine codes as in three specific ways: -

  • position independent code
  • roaming-code
  • Absolute code

First, let's discuss a positionally independent code . This is a code that, during assembly, has all its instructions regarding each other. Thus, branches, for example, indicate the offset from the current instruction pointer (or program counter, depending on what you want to name). A position-independent code will consist of only one code segment and contain its data in that segment (or section). There are exceptions for data embedded in the same segment, but these benefits are usually passed to you by the operating system or bootloader.

This is a very useful type of code because it means that the operating system does not have to perform any operations after loading in order to be able to start execution. It will only work wherever it is loaded into memory. Of course, this type of code also has its problems, namely things like the inability to allocate code and data, which may be suitable for different types of memory and size restrictions before relatives begin to move outside the allowable range, etc. But these are just some of them.

Relocatable-Code is a lot like position-independent code, but it has a very subtle difference. As the name implies, this type of code moves in that this code can be loaded anywhere in memory, but it usually moves or locks before it is executed. In fact, some architectures that use this type of code insert things like "reloc" sections for this purpose, to fix the moving parts of the code. The disadvantage of this type of code is that when it is moved and fixed, it almost becomes absolute in nature and is fixed at its address.

What gives the relocated code its main advantage, and the reason why it is the most common code, is that it makes it easy to break the code into sections. Each section can be loaded anywhere in memory to meet its requirements, and then when moving any code that refers to another section can be copied using a movement table, and thus sections can be well connected to each other . The code itself is usually relative (as in the x86 architecture), but this is not necessary, since anything that can be out of range can be assembled as a roaming command, so it consists of an offset added to its load address. It also means that the restrictions imposed by relative addressing are no longer a problem.

Final code type Absolute code . This code, which is collected to work at one specific address and will only work when downloaded to this specific address. The branch and jump instructions contain a fixed exact (absolute) address. This is the type of code that is usually found in embedded systems, so you can guarantee that a piece of code will be downloaded to this specific address, since it only loads there. On a modern computer, such absolute code will not work, because the code must be downloaded wherever there is free memory, and it is never guaranteed that a certain range of memory will be available. Absolute code has its advantages, although basically it is that it is usually the fastest execution, but it can be platform dependent.

+1
source share

"relocatable" means that the assembler builds code snippets and sets the RAM requirements that can be placed wherever the linker finds a place for them.

0
source share

All Articles