Help with C pointers

I have a query regarding pointers, can someone help explain the following to me?

I understand how pointers work, but I'm not too sure about how to rewrite portions of memory from addresses, changing program behavior.

I will explain the following, as far as I can, in accordance with what I understand, do not hesitate to criticize and educate me about my misunderstandings, there is a code:

void f(int) ; int main ( int argc, char ** argv ) { int a = 1234 ; f(a); printf("Back to main\n") ; } void g() { printf("Inside g\n") ; } void f (int x) { int a[100] ; memcpy((char*)a,(char*)g,399) ; x = *(&x-1) ; *(&x-1) = (int)(&a) ; // note the cast; no cast -> error // find an index for a such that a[your_index] is the same as x printf("About to return from f\n") ; } //This program, compiled with the same compiler as above, produces the following output: //About to return from f //Inside g //Back to main 

Well, that I understand, the way it is.

The program starts procedurally with frorm main (), it assigns a, then goes to f () with the variable as.

Inside f ():

It takes an array a of size 100. Then it copies the memory space from g () to the entire array. So now essentially [] is g (). x is then assigned to the original address a from main () - 1, which I would call the address main (). (I'm not sure about this, correct me if I'm wrong)

From now on, I'm not too sure how he manages to call [] (one that is overloaded with g ()) or even g (). It seems to end with f () and return to main ().

Thanks to someone who can help me with this!

Hooray!

+6
c
source share
6 answers

Technically, this code goes far beyond what the C standard defines, so it can do something. He makes a huge number of assumptions that he has no right, and these assumptions, of course, are not universal. However, I can put forward a very probable explanation of why you see the conclusion you are making:

You are right until the moment when you copied the code of the function g() to the memory occupied by the local variable of the array a .

To understand the next line, you need to know a little about how functions usually invoke common stack-based architectures. When a function is called, parameters are pushed onto the stack, then the return address is pushed onto the stack, and execution moves to the starting point of the function. Inside the function, the previous frame pointer is pushed onto the stack, then a place is created for local variables. Stacks tend to grow down in memory (from high addresses to low addresses), although this does not apply to all common architectures.

So, when the main calls to the f() function, the stack first looks like this (the frame pointer and the stack pointer are two CPU registers containing the addresses of the locations on the stack):

  | ... | (higher addresses) | char **argv (parameter) | |-------------------------| | int argc (parameter) | |-------------------------| FRAME POINTER -> | saved frame pointer | |-------------------------| | int a | |-------------------------| | int x (parameter) | &x |-------------------------| STACK POINTER -> | return address | &x - 1 |-------------------------| | ... | (lower addresses) 

The function prolog then saves the frame pointer of the calling function and moves the stack pointer to create space for local variables in f() . Therefore, when the C code in f() starts execution, the stack now looks something like this:

  | ... | (higher addresses) | char **argv (parameter) | |-------------------------| | int argc (parameter) | |-------------------------| | saved frame pointer | |-------------------------| | int a | |-------------------------| | int x (parameter) | &x |-------------------------| | return address | &x - 1 |-------------------------| FRAME POINTER -> | saved frame pointer | |-------------------------| | a[99] | &a[99] | a[98] | &a[98] | ... | ... STACK POINTER -> | a[0] | &a[0] | ... | (lower addresses) 

What is a frame pointer? It is used to refer to local variables and parameters inside a function. The compiler knows that when f() executed, the address of the local variable a always FRAME_POINTER - 100 * sizeof(int) , and the address of the x parameter is FRAME_POINTER + sizeof(FRAME_POINTER) + sizeof(RETURN_ADDRESS) . All local variables and parameters can be accessed as a fixed offset from the frame pointer, regardless of how the stack pointer moves when the stack space is allocated and freed.

Anyway, back to the code. When this line is executed:

 x = *(&x-1) ; 

It copies the value, which is stored 1 integer size lower in memory than x , to x . If you look at my ASCII art, you will see that it is a return address. So actually doing this:

 x = RETURN_ADDRESS; 

Next line:

 *(&x-1) = (int)(&a) ; 

Then it sets the return address to the address of array a . It really says:

 RETURN_ADDRESS = &a; 

The cast is required because you are treating the return address as an int and not a pointer (so this code will only work on architectures where the int is the same size as the pointer - this will NOT work on 64-bit POSIX systems, for example! )

Now the C code in the f() function is executed, and the epilogue function does not select local variables (by moving the stack pointer back) and restores the frame pointer of the caller. At the moment, the stack is as follows:

  | ... | (higher addresses) | char **argv (parameter) | |-------------------------| | int argc (parameter) | |-------------------------| FRAME POINTER -> | saved frame pointer | |-------------------------| | int a | |-------------------------| | int x (parameter) | &x |-------------------------| STACK POINTER -> | return address | &x - 1 |-------------------------| | saved frame pointer | |-------------------------| | a[99] | &a[99] | a[98] | &a[98] | ... | ... | a[0] | &a[0] | ... | (lower addresses) 

Now the function returns, going to the value RETURN_ADDRESS, but we set it to &a , so instead of returning to where it was called, it goes to the value of the beginning of array a - now it executes the code from the stack. Here you copied the code from the g() function, so that the code (apparently) would happily execute. Please note that since the stack pointer was moved above the array here, any asynchronous code that runs on the same stack (for example, a UNIX signal that arrives at the wrong time) will overwrite the code!

So, here is what the stack now looks like at the start of g() , before the prologue function:

  | ... | (higher addresses) | char **argv (parameter) | |-------------------------| | int argc (parameter) | |-------------------------| FRAME POINTER -> | saved frame pointer | |-------------------------| | int a | |-------------------------| STACK POINTER -> | int x (parameter) | |-------------------------| | return address | |-------------------------| | saved frame pointer | |-------------------------| | a[99] | | a[98] | | ... | | a[0] | | ... | (lower addresses) 

The prolog for g() then sets the stack frame as usual, executes it, and unwinds it, leaving a pointer to the frame and stack pointer, as in the previous diagram above.

Now g() returns, so it looks for the return value at the top of the stack, but the top of the stack (where the stack pointer indicates) is actually the place where the parameter x performs the function f() live - and that's where we hid the original return value earlier, so it goes back to where f() is called from.

As a side note, the stack is now desynchronized in main() since it expected the stack pointer to be where it was when it called f() (which points to where x was stored), but now it actually points to a local variable a . This will cause some strange effects - if you call another function from main this time, the contents of a will be changed!

I hope you (and others) have learned something valuable from this discussion, but it’s important to remember that this is similar to the “Five Pointed Palm Heart”. Programming Technique - NEVER uses it in a real system. A new gift architecture, compiler, or even just different compiler flags can and will change the runtime environment enough to make such code too smart on the floor, completely compromised by all kinds of exciting and fun ways.

+20
source share

Well, this is just a possible explanation of what has been going on since the above, when you rewrite the “important” memory addresses, anything can happen.

With that said, something similar seems to be happening:

  • You call f () from main. The return address (call printf) is pushed onto the stack, followed by the value.
  • You copy the code from g () to [].
  • (You change x, but that does nothing).
  • You rewrite the return address on the stack with the address [] (containing a copy of the g () code).
  • f () returns the code in code [], which runs the code g ().

Again, these are all assumptions - it depends on the compiler, the compiler options, and the platform on which you are running this.

+3
source share

Well, I developed how it goes back to the core (see commentary on Tal's answer).

You need to know how the stack works, in particular, on the Intel processor.

Basically, the stack looks like this:

 stacktop: 1234 - the a variable, locals are normally on the stack) 

at the beginning f (x) looks like this:

 stacktop: 1234 - main a -1 1234 - the argument to f(), pushed onto the stack -2 ret_addr - points to the printf in main, where f() will go when it finished -3 a[99] -4 a[98] ... -101 a[1] -102 a[0] 

Stacks grow from top to bottom.

In this case, the code "(& x-1)" points to stacktop-2, since & x is the address of the parameter passed to f (), which is stacktop-1.

After copying the g () function to the a [] array, you then set the passed x value to ret_addr, so the stack will be as follows:

 stacktop: 1234 - main a -1 ret_addr - the modified value of x -2 ret_addr - points to the printf in main, where f() will go when it finished -3 a[99] ... 

Then you set (& x-1) to []:

 stacktop: 1234 - main a -1 ret_addr - the modified value of x -2 &a[0] - points to the copy of g -3 a[99] ... 

Then the function ends. This moves the stack pointer to stacktop-2, freeing up the allocated locales (in this case - []), and then goes on to what's on the stack, in this case & a [0] (stacktop-2) and the stack size is reduced.

This indicates a copy of g (). g () executes and then exits by jumping to the address at the top of the stack (stacktop-1, in which case it will now be a pointer to printf in the main one) and decreasing the stack again.

This has a lot of problems.

  • If the g function is greater than 100 bytes, you will get a buffer overflow.
  • If the function g contains absolute addresses for the code in g, say, a conditional jump> 128 bytes, then the copy will try to go to the original g.
  • If there is an interrupt at the end of f () between freeing local variables and going to the return address, the copy of g () may be corrupted.
  • In an optimized assembly, the parameter passed to f () will most likely be passed in the register, and not on the stack, which will ruin the order of the return address, that is, it will work.
  • The OS may prevent the execution of data on the stack.
  • If you use a segmented memory system (i.e. 16 bits 8086), then returning from f () or g () will not use the correct segment, and the program will crash.
  • The stack can grow on some processors up.

As a rule, do not mess with the stack or copy code.

+3
source share

A function call with arguments "by value" does not make the arguments modified by the function. Prime integers are passed by value when you call f(a); from main (), which does not allow the function f to change the value of a , it receives only the value. If you want to change the original variable, you need to call by reference, i.e. f(&a); , after changing the function to accept the pointer, of course.

A little ... it makes no sense to speculate on what to expect when you do undefined things, such as overwriting memory. Also, trying to copy a function code from its address is not very secure.

+2
source share

You should take a look at this classic article that explains the mechanism.

This program assumes that the sequence number of the parameter in the stack:

[ret address] [x]

So & x-1 - return address

+1
source share

In a, you got the whole function "g". With a change in the "x" of x at the starting return point, another change in X and rewriting it with a pointer to "a" should change the return value of the main function, but I'm not really sure if it will exit, it will depend on the optimizations used.

0
source share