Why does inserting characters into an executable binary result in a “break”?

Question

Why does inserting characters into an executable binary result in a “break”?

Why does inserting characters into an executable binary cause it to break?

And, is there a way to add characters without breaking the compiled program?

Background

I have long known that you can use a hex editor to change the code in a compiled executable file and still work as usual ...

Example

As an example, in the application below, Facebook can be changed to Lacebook , and the program will still only perform a fine:

But he is interrupted with new characters

I also know that if new characters are added, it will break the program and it will not start, or it will work immediately. For example, adding My before Facebook will achieve this:

What i know

I did some work with C and realized that the code is written in a language that is readable, compiled and associated with an executable file.
I did introductory studies of assembly language and understood the concepts of moving data, commands, and pointers.
I have written small programs for Windows, Mac and Linux.

What I do not know

I do not quite understand the relationship between the operating system and the executable. I assume that when you enter the program name and press return, you basically instruct the operating system to "execute" this file, which basically means loading the file into memory, setting the processor pointer to it and pointing it to "Go!"
I understand why the presence of extra characters in the text line of the binary will cause problems.

What i would like to know

Why do extra characters cause a program break?
What determines that a program is broken? OS? Does the OS also support this program in the sandbox so that it does not destroy the entire system at present?
Is there a way to add extra characters to the text string of a compiled program using a hex editor and not have the application break?

+6

c compiler-construction linux linker hex-editors

Steve brown Dec 31 '14 at 1:08

source share

3 answers

When a program is compiled into machine code, it contains many references to the addresses of instructions and data in the program memory. The compiler determines the layout of the entire program memory and places these addresses in the program. The executable file is also grouped into sections, and at the beginning contains a table of contents containing the number of bytes in each section.

If you insert something into the program, the address of everything after that is shifted. But parts of the program that contain links to the location of programs and data are not updated, they continue to point to the source addresses. In addition, the table containing the sizes of all partitions is no longer correct, since you increased the size of any partition that you changed.

+5

Barmar Dec 31 '14 at 1:17

source share

The format of the machine language executable file is based on hard offsets, and not on parsing a byte stream (for example, the text source code of a program). When you insert a byte somewhere, the file format continues to refer to the information that follows the insertion point at the original offsets.

Offsets can occur in the file format itself, such as a header that tells the loader where things are in the file and how big they are.

Hard offsets are also found in the machine language itself, for example, in instructions that relate to program data or branch instructions.

Suppose the instruction says “branch 200 bytes down from where we are now” and you insert bytes into these 200 bytes (because there is a character string that you want to change). Unfortunately, the branch still spans 200 bytes.

On some machines, the branch may not even be 201 bytes, even if you fixed it, because it will be biased and will throw a CPU exception; you will have to add, say, four bytes to fix it up to 204 (along with many other things necessary to ensure the file is correct).

+3

Kaz Dec 31 '14 at 1:18

source share

David schwartz · Accepted Answer · 2013-12-31T01:17:50+0000

I do not quite understand the relationship between the operating system and the executable. I assume that when you enter the program name and press return, you basically instruct the operating system to "execute" this file, which basically means loading the file into memory, setting the processor pointer to it and pointing it to "Go!"

Modern operating systems simply map the file to memory. They do not load pages until needed.

Why do extra characters make the program break?

Because they put all other information in a file in the wrong place, so the bootloader finishes loading the wrong things. In addition, jumps in the code end in the wrong place, perhaps in the middle of the instruction.

What determines that a program is broken? OS? Does the OS also support this program in the sandbox so that it does not destroy the entire system at present?

It depends on what is tied. Perhaps you are moving the header, and the loader notices that some parameters in the header have invalid data.

Is there a way to add extra characters to the text string of a compiled program through a hex editor and not have an application break?

Probably not reliable. At a minimum, you need to reliably identify sections of code that need to be adjusted. This can be surprisingly difficult, especially if someone tried to do it so intentionally.

Why does inserting characters into an executable binary result in a “break”?

Background

Example

But he is interrupted with new characters

What i know

What I do not know

What i would like to know

More articles: