Why do x86-64 Linux system calls work with 6 sets of registers?

I am writing a standalone C program that only depends on the Linux kernel.

I studied the relevant pages manual and found that on x86-64 the Linux system call entry point receives the system call number and six arguments through seven registers rax , rdi , rsi , rdx , r10 , r8 and r9 .

Does this mean that every system call takes six arguments?

I examined the source code of several libc implementations to find out how they make system calls. Interestingly, musl contains two different approaches to system calls:

  • src/internal/x86_64/syscall.s

    This assembly source file defines one __syscall function that moves the system call number and exactly six arguments to the registers defined in the ABI. The common name of the function suggests that it can be used with any system call, despite the fact that it always passes six arguments to the kernel.

  • arch/x86_64/syscall_arch.h

    This C header file defines seven separate __syscallN functions , with N and their arity. This suggests that the advantage of passing only the exact number of arguments that a system call requires exceeds the cost of having and maintaining seven nearly identical functions.

So, I tried it myself:

 long system_call(long number, long _1, long _2, long _3, long _4, long _5, long _6) { long value; register long r10 __asm__ ("r10") = _4; register long r8 __asm__ ("r8") = _5; register long r9 __asm__ ("r9") = _6; __asm__ volatile ( "syscall" : "=a" (value) : "a" (number), "D" (_1), "S" (_2), "d" (_3), "r" (r10), "r" (r8), "r" (r9) : "rcx", "r11", "cc", "memory"); return value; } int main(void) { static const char message[] = "It works!" "\n"; /* system_call(write, standard_output, ...); */ system_call(1, 1, message, sizeof message, 0, 0, 0); return 0; } 

I ran this program and verified that it writes It works!\n for standard output. This left me with the following questions:

  • Why can I pass more parameters than a system call?
  • Is this reasonable, documented behavior?
  • What should I set for unused registers?
    • Is there 0 in order?
  • What will the kernel do with registers that it does not use?
    • Will ignore them?
  • Is a function with seven functions faster due to lack of instructions?
    • What happens to other registers in these functions?
+7
x86-64 linux-kernel abi system-calls operating-system
source share
1 answer

System calls take up to 6 arguments passed to registers (almost the same registers as SysV x64 C ABI , with r10 replacing rcx , but they are called in case of syscall), and the "extra" arguments are simply ignored.

Some specific answers to your questions below.

src/internal/x86_64/syscall.s is just "thunk", which translates all arguments to the right place. That is, it is converted from the C-ABI function, which takes the system call number and 6 more arguments, to the "syscall ABI" function with the same 6 arguments and the syscall number in rax . It works "just fine" for any number of arguments - additional register movement will simply be ignored in syscall if these arguments are not used.

Since in C-ABI all argument registers are considered scratches (i.e. caller-save), knocking them off is harmless if you assume that this __syscall method __syscall called from C. In fact, the kernel gives stronger guarantees about clobbered registers , clobbering is only rcx and r11 , so assuming the C calling convention is safe, but pessimistic. In particular, the code that calls __syscall , as implemented here, will unnecessarily save any arguments and zero registers for C ABI, despite the promise of the kernel to keep them.

The file arch/x86_64/syscall_arch.h is almost the same, but in the header file C. Here you need all seven versions (from zero to six arguments), because modern C compilers will warn or fail if you call a function with wrong number of arguments. Thus, there is no real possibility of having β€œone function to manage them all”, as in the case of assembly. It also has the advantage that you run fewer work systems that take less than 6 arguments.

Your listed questions answered:

  • Why can I pass more parameters than a system call?

Because the calling agreement is mainly based on registers and caller clearing. You can always pass more arguments in this situation (including in C ABI), and the rest of the arguments are simply ignored by the called one. Since the syscall mechanism is common at the C and .asm level, there is no real way that the compiler can guarantee that you are passing the correct number of arguments - you need to pass the right syscall id and the correct number of arguments. If you pass less, the kernel will see garbage, and if you pass more, they will be ignored.

  • Is this reasonable, documented behavior?

Yes, of course - because the whole syscall mechanism is a "generic shutter" in the kernel. 99% of the time when you are not going to use this: glibc wraps the vast majority of interesting system calls in correctly signed C ABI wrappers, so you have nothing to worry about. These are ways to securely access syscall.

  • What should I set for unused registers?

You don’t put anything to them. If you use prototypes C arch/x86_64/syscall_arch.h , the compiler just takes care of it for you (it doesn’t install them on anything), and if you write your own asm you don’t put them to anyone (and you have to accept they flatten after syscall).

  • What will the kernel do with registers that it does not use?

He can use all the registers he wants, but he will adhere to the kernel calling convention, which is on x86-64, all registers except rax , rcx and r11 are saved (which is why you see rcx and r11 in the clobber list in C asline asm).

  • Is a function with seven functions faster due to lack of instructions?

Yes, but the difference is very small, because reg-reg mov instructions usually have zero delay and have high throughput (up to 4 / cycle) on the latest Intel architectures. Therefore, moving an additional 6 registers will probably take about 1.5 cycles for a system call, which usually takes at least 50 cycles, even if it does nothing. Thus, the impact is small, but probably measurable (if you evaluate it very carefully!).

  • What happens to other registers in these functions?

I'm not sure what you mean, but other registers can be used in the same way as all GP registers, if the kernel wants to keep its values ​​(for example, push them on the stack and then pop them later).

+6
source share

All Articles