How are functions stored in memory?

I delve into Linux and C and am curious how functions are stored in memory. I have the following function:

void test(){ printf( "test\n" ); } 

Simple enough. When I run objdump in an executable that has this function, I get the following:

 08048464 <test>: 8048464: 55 push %ebp 8048465: 89 e5 mov %esp,%ebp 8048467: 83 ec 18 sub $0x18,%esp 804846a: b8 20 86 04 08 mov $0x8048620,%eax 804846f: 89 04 24 mov %eax,(%esp) 8048472: e8 11 ff ff ff call 8048388 <printf@plt> 8048477: c9 leave 8048478: c3 ret 

That everything looks right. The interesting part is when I run the following code snippet:

 int main( void ) { char data[20]; int i; memset( data, 0, sizeof( data ) ); memcpy( data, test, 20 * sizeof( char ) ); for( i = 0; i < 20; ++i ) { printf( "%x\n", data[i] ); } return 0; } 

I get the following (which is wrong):

 55 ffffff89 ffffffe5 ffffff83 ffffffec 18 ffffffc7 4 24 10 ffffff86 4 8 ffffffe8 22 ffffffff ffffffff ffffffff ffffffc9 ffffffc3 

If I choose to leave memset (data, 0, sizeof (data)); line, then the rightmost byte is correct, but some of them still have leading 1s.

Does anyone have any explanation why

A) using memset to clear my array leads to an incorrect (edit: inaccurate) representation of the function and

SOLUTION: was associated with the use of memset (data, 0, sizeof (data)), not memset (data, 0, 20 * sizeof (unsigned char)). The memory was not fully installed because it looked only at the size of the pointer than the size of the entire array.

B) what is this byte stored in memory? Ints? char? I don’t quite understand what is going on here. (clarification: what type of pointer would I use to move such data into memory?)

SOLUTION: I'm dumb. I forgot the unsigned keyword, and this is where the whole problem arose :(

Any help would be greatly appreciated - I could not find anything when I was looking for it.

Neil

PS: I immediately thought that this is the result of x86 having instructions that do not end on a byte or nibble border. But this does not make much sense and should not cause any problems.

Thanks Will for pointing out my error with type char. It must be unsigned char. I'm still interested in learning how to access individual bytes.

+8
c function linux memory objdump
source share
5 answers

Here is a simpler example of the code you tried to make:

 int main( void ) { unsigned char *data = (unsigned char *)test; int i; for( i = 0; i < 20; ++i ) { printf( "%02x\n", data[i] ); } return 0; } 

The changes I made were to remove the extra buffer, use the pointer to check instead, use unsigned char instead of char, and change printf to use "% 02x" so that it always prints two characters [it will not fix the "negative" numbers coming out as ffffff89 or so that is fixed using unsigned in the data pointer.

All instructions in x86 end on byte boundaries, and the compiler often inserts additional “populate commands” to ensure that, for efficiency, the target elements of the branch are aligned with 4, 8, or 16 byte boundaries.

+4
source share

I believe your chars expands to the width of an integer. You can get results closer to what you want by explicitly setting the value when printing.

+6
source share

The answer to B) the byte is stored in memory as a byte. A memory location with 1 byte contained in a memory location (byte unsigned char )

Hint: Take a good book on Computer Organization (my favorite is Karl Hamachar and is well versed in how memory is internally represented)

In your code:

 memset( data, 0, sizeof( data ) );// must be memset(data,0,20); memcpy( data, test, 20 * sizeof( char ) ); for( i = 0; i < 20; ++i ) { printf( "%x\n", data[i] );// prints a CHARACTER up-casted to an INTEGER in HEX representation, hence the extra `0xFFFFFF` } 
+1
source share

The problem is what your code prints.

One byte is loaded from the data array. (one byte == one char)

The byte is converted to 'int', since the one that the compiler knows 'printf' wants. To do this, it expands the bytes to a 32-bit double word. This is what is printed as hexadecimal. (This means that the high-bit byte will be converted to a 32-bit value with bits 8-31 set. These are the ffffffxx values ​​you see.)

In this case, I have to convert:

  printf( "%x\n", ((int)data[i] && 0xFF) ); 

Then it will print correctly. (If you were loading 16-bit values, you would have AND with 0xffff.)

+1
source share

Printing looks weird because you print signed values, so they expand.

However, the printed function is also slightly different. It seems that instead of loading EAX with the address of the string and pushing it onto the stack, it simply stores the address directly.

 push ebp mov ebp,esp sub esp,18h mov dword ptr [esp],8048610h call <printf> leave ret 

As for why it changes when you make seemingly benign changes elsewhere in the code - well, that’s allowed. Therefore, you should not rely on undefined behavior.

0
source share

All Articles