Why should we recompile the c source code for different os on the same machine?

When I compile the source code c (for example, in a Linux environment), the compiler creates a file in the format of "machine readable".

  • Why does the same file not work on the same computer in a different operating system?
  • The problem with how we "execute" this file?
+5
source share
7 answers

Sometimes this will work, depending on the format and libraries you use, etc. For example, things like allocating memory or creating a window invoke OS functions. Thus, you must compile the target OS with those libraries that are linked (statically or dynamically).

However, the instructions themselves are the same. Thus, if your program does not use any of the OS functions (there is no standard or any other library), you can run it on another OS. The second thing that is problematic here is the executable formats. Windows .exe very different from, for example, ELF. However, a flat format that has only instructions (e.g. .com ) will work on all systems.


EDIT: A fun experiment would be to compile some functions in a flat format (instructions only) on one OS (e.g. Windows). For instance:

 int add(int x, int y) { return x + y; } 

Save only file instructions without any movement or other staging information. Then on another OS (like Linux) compile a complete program that will do something like this:

 typedef int (*PFUNC)(int, int); // pointer to a function like our add one PFUNC p = malloc(200); // make sure you have enough space. FILE *f = fopen("add.com", "rb"); fread(p, 200, 1, f); // Load the file contents into p fclose(f); int ten = p(4, 6); 

For this to work, you also need to tell OS / Compiler that you want it to be able to execute the allocated memory, and I'm not sure how to do this, but I know that it can be done.

+4
source

I was asked what the ABI mismatch is. I think it’s best to explain with a simple example.

Consider a slightly dumb function:

 int f(int a, int b, int (*g)(int, int)) { return g(a * 2, b * 3) * 4; } 

Compile it for x64 / Windows and for x64 / Linux.

For x64 / Windows, the compiler emits something like:

 f: sub rsp,28h lea edx,[rdx+rdx*2] add ecx,ecx call r8 shl eax,2 add rsp,28h ret 

For x64 / Linux, something like:

 f: sub $0x8,%rsp lea (%rsi,%rsi,2),%esi add %edi,%edi callq *%rdx add $0x8,%rsp shl $0x2,%eax retq 

Given the various traditional notation of assembly language in Windows and Linux, it is obvious that there are significant differences in the code.

On Windows, it is expected that a will go to ECX (lower half of the RCX register), b to EDX (lower half of the RDX register) and g to R8 . This is due to the x64 / Windows calling convention, which is part of the application binary interface (ABI). The code also prepares g arguments in ECX and EDX .

The Linux version expects a in EDI (lower half of the RDI register), b in ESI (lower half of the RSI register) and g in RDX . This is prescribed by the calling convention System V AMD64 ABI (used for Linux and other Unix-like x64 operating systems). The code prepares g arguments in EDI and ESI .

Now imagine that we are running a Windows program that somehow extracts the body f from a Linux-oriented module and calls it:

 int g(int a, int b); typedef int (*G)(int, int); typedef int (*F)(int, int, G); F f = (F) load_linux_module_and_get_symbol("module.so", "f"); int result = f(3, 4, &g); 

What will happen? Since Windows functions expect their arguments in ECX , EDX and R8 , the compiler places the actual arguments in these registers:

 mov edx,4 lea r8,[g] lea ecx,[rdx-1] call qword ptr [f1] 

But the Linux-oriented version of f looking for values ​​elsewhere. In particular, it searches for the address g in RDX . We just initialized its lower half to 4, so there’s virtually no chance that the RDX will contain anything that makes sense. Most likely, the program crashes.

Running Windows-oriented code on a Linux system will have the same effect.

Thus, we cannot run someone else’s code, but using thunk. Thunk is part of a low-level code that rebuilds arguments to allow calls between pieces of code, following different sets of rules. (Perhaps Thunks can do something else because ABI effects cannot be limited by the calling convention.) Normally, you cannot write thunk in a high-level programming language.

Note that in our scenario we need to provide thunks for f ('host-to-foreign') and g ('foreign-to-host').

+3
source

There are two important things:

  • development environment;
  • target platform.

The compiler of the development environment generates an object file with machine code and links to functions and data not contained in the moule object (not defined in the source file). Another program, the linker, combines all of your object modules and libraries into an executable file. Note:

  • The object module format is basically platform independent, although there are standards for platforms that easily integrate object modules created by different compilers for the platform. But this should not be; a fully integer development environment may have its own "standard."

  • A linker can be a program from any manufacturer. He must know the format of the object modules, the format of the libraries and the desired format of the results. Only this last format is platform dependent.

  • Libraries can be in any format if there is a linker that can read them. BUT: libraries are platform dependent because the functions in the library call the operating system API.

A cross-development environment can, for example, generate Windows-compatible object modules, then the linker can link them to libraries in a Windows-compatible format, but is Linux-oriented (using Linux OS calls) and delivers the Linux executable. Or any combination that you like (Linux object format, Windows library format, Windows executable, ...).

To summarize, the only truly platform-dependent elements are the functions in the libraries, since they invoke the OS and the resulting executable, as this is what the OS will load.

So, to answer the question: no, there is no need to compile the source file for different platforms. The same object module can be linked for Linux (using Linux target libraries and creating a Linux-style executable) or for Windows (using Windows target libraries and creating a Windows executable).

+2
source
  • Different operating systems will use different application binary interfaces (ABIs), including the code required to enter and exit the function.
  • Some language features may require direct platform support (things like local stream storage come to mind)
  • The linker typically automatically links to the standard standard toolchain library. This will need to be changed between operating systems, since there is no other reason that each operating system has its own set of system calls.

Having said that, the Wine project is a good example where all these problems have been wrapped up in order to try to make Windows code run on Linux.

+2
source

You are right, compilation translates your source code into machine-readable code, for example. into x86 machine code.

But there is more to it. Your code often not only uses machine code that is compiled into your executable file, but also references operating system libraries. All modern operating systems provide various programs and libraries for programs. Therefore, if your program is designed to work with, for example, some Linux libraries, and then runs on an operating system that does not contain these libraries, it will not work.

Another thing here is the format of the executable file. Most executable files contain not only executable machine code, but also some metadata, for example. icons, information on how the file is packaged, version numbers, and more. Thus, by default, if you run, for example, a Windows.exe file in Linux, the operating system will not be able to process this format correctly.

Systems such as Wine add missing libraries and can handle various formats of executable files, which allows you to run, for example. Windows.exe file in Linux, as if it was running on Windows initially.

+1
source

There are some good general answers here. I will give you a very concrete example.

An x86 machine can easily run printf("Hello world") both 32-bit Linux and DOS if the C file is compiled for each platform.

One of the many significant differences between operating systems is how the program instructs the operating system to provide the services that it performs. This is how you ask Linux to print the line:

 msg db "Hello world" # Define a message with no terminator mov edx, 11 # Put the message length in the edx register mov ecx, msg # Put the message address in ecx mov ebx, 1 # Put the file descriptor in ebx (1 meaning standard output) mov eax, 4 # Set the system call to 4, "write to file descriptor" int 80h # Invoke interrupt 80h to give control to Linux 

This is how you ask DOS to print the same line:

 msg db "Hello world$" # Define a message terminated by a dollar sign mov dx, msg # Load the message address into dx mov ah, 9 # Set the system call number to 9, "print string" int 21h # Invoke interrupt 21h to give control to DOS 

They use the same basic, machine-readable and executable instructions, but these directions are different from English and Chinese.

So, can you teach Linux how to understand the directions intended for DOS and run the same file on both? Yes you can, and what DosEmu did that day. This is also how Linux + Wine launches Windows software and how FreeBSD launches Linux software. However, he has a lot of headache and extra work, and it still cannot be very compatible.

+1
source

I am posting this answer to Andrei discussing ABI as an answer, because it is too much for comment and requires formatting the response.

Andrey, what you show has nothing to do with Linux or Windows. This is an example development environment using certain conventions. All object modules and modules in libraries must adhere to these conventions, and nothing more. It is not Linux or Windows that expects values ​​in certain registers; it is a development environment.

The following is a more standard way for conditional calls to C (Visual Stdio 2008). In all cases, the caller must evaluate the parameters from right to left in accordance with standard C:

 int f(int a, int b, int (*g)(int, int)) { push ebp mov ebp,esp return g(a * 2, b * 3) * 4; mov eax,dword ptr [ebp+0Ch] imul eax,eax,3 push eax mov ecx,dword ptr [ebp+8] shl ecx,1 push ecx call dword ptr [ebp+10h] add esp,8 shl eax,2 mov esp,ebp pop ebp ret } 
  • The caller calls the parameters from right to left and calls the called call

  • the caller keeps a pointer to the stack, usually ebp on Intel, and adds to esp for local storage (not here)

  • The calling link refers to parameters regarding ebp

  • The calling function performs its function

  • The calling function restores ebp and returns

  • The caller removes the call parameters from the stack, for example. add esp,8

Again, this is a development environment that dictates these conventions, not the OS. The OS may have its own agreements for applications to request services. Then they are implemented in OS-oriented libraries.

+1
source

All Articles