Linux heap structure and behavior with malloc () and free ()

I have Debian with the Linux 2.6 kernel and am trying to understand how the heap works / behaves with malloc() and free() . I tried to find the malloc() and free() algorithm and the heap structure, but I could not find anything useful. And, unfortunately, I know too little about Linux and how memory works to understand the source code of free() and malloc() .

This is a sample code:

 int main(int argc, char **argv) { char *a, *b, *c; a = malloc(32); b = malloc(32); c = malloc(32); strcpy(a, argv[1]); strcpy(b, argv[2]); strcpy(c, argv[3]); free(c); free(b); free(a); } 

With gdb and run AAAA BBBB CCCC I can check a bunch. This is the state after strcpys , but before frees :

 (gdb) x/32x 0x804c000 0x804c000: 0x00000000 0x00000029 0x41414141 0x00000000 0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029 0x804c030: 0x42424242 0x00000000 0x00000000 0x00000000 0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c050: 0x00000000 0x00000029 0x43434343 0x00000000 0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89 

Char arrays can be seen very well. Then I tried to find out why 0x29 (dec 41) exists. I would expect something like 0x20 (dec 32) or 0x24 (dec 36).

  • Why does the malloc algorithm spend this space?
  • How was the decision made that it was 0x29?
  • And what does 0xf89 mean at the end?
  • How does the program track what is allocated and what is free?

In particular, I want to understand how free() works. After three releases, the heap looks like this:

 (gdb) x/32x 0x804c000 0x804c000: 0x00000000 0x00000029 0x0804c028 0x00000000 0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029 0x804c030: 0x0804c050 0x00000000 0x00000000 0x00000000 0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c050: 0x00000000 0x00000029 0x00000000 0x00000000 0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89 
  • Why is the char array replaced with this specific address?
  • What is a pseudo code that does for free?

Take a look at this example:

 (gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADDDDD BBBB CCCC ... (gdb) x/32x 0x804c000 0x804c000: 0x00000000 0x00000029 0x41414141 0x41414141 0x804c010: 0x41414141 0x41414141 0x41414141 0x41414141 0x804c020: 0x41414141 0x41414141 0x44444444 0x00000044 0x804c030: 0x42424242 0x00000000 0x00000000 0x00000000 0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c050: 0x00000000 0x00000029 0x43434343 0x00000000 0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89 ... (gdb) c Program exited with code 021. 

I overwrote 0x29, but the program completed normally. But when I add another byte, I encounter a segmentation error:

 (gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADDDDD BBBB CCCC ... (gdb) x/32x 0x804c000 0x804c000: 0x00000000 0x00000029 0x41414141 0x41414141 0x804c010: 0x41414141 0x41414141 0x41414141 0x41414141 0x804c020: 0x41414141 0x41414141 0x44444444 0x00004444 0x804c030: 0x42424242 0x00000000 0x00000000 0x00000000 0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c050: 0x00000000 0x00000029 0x43434343 0x00000000 0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000 0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89 ... (gdb) c Program received signal SIGSEGV, Segmentation fault. 0x080498b9 in free (mem=0x804c030) at common/malloc.c:3631 

The most important question for me:

  • Why do you get a segmentation error in free() when overwriting more bytes?
  • and how does the free() algorithm work?
  • and how to make malloc and track addresses for free?

Thanks so much for reading, good wishes

+4
source share
3 answers

Most malloc() implementations work by monitoring the state of the heap inside the heap itself, right before and / or after allocated memory blocks. Exceeding the allocated block will damage this data - some of this data may include pointers or lengths, and damage to these objects will lead to an attempt to access invalid memory locations.

Information on how the malloc() implementation works depends on which system and libc you use. If you use glibc (which is quite possible if you work on Linux), there is a pretty good explanation of how it works here:

http://gee.cs.oswego.edu/dl/html/malloc.html

Assuming this is the case, the 0x29 you see is probably a bitwise OR block size (32 = 0x20 ) and some flags. This is possible because all heap allocations are rounded to the nearest 16 bytes (or more!), So the bottom eight bits of size can always be considered equal to zero.

+9
source

I definitely don’t know the details. But overall, it works as follows:

Larger malloc() processed through mmap() , so we focus on smaller ones. Somewhere there is a limit where you can set a threshold.

Smaller malloc() processed at the end of the data segment. This can be handled and resized by glibc using the brk() and sbrk() system calls.

After you malloc() block of memory, you need to hold it to find out how much free when free() is called, and the pointer to the next block must be saved to find them all and combine them.

After free() with a block of memory at the end, the data segment is reduced with sbrk() . After free() with a block that is NOT at the end, the block is added to the free list. This is a linked list of free memory blocks to reuse.

0x29 , which is 41 , is the size of the memory block that you allocated, plus a memory bit to store the specified fields (size and next pointer), which require 8 bytes. For the 9th, I do not know, but maybe this is due to alignment.

If you write more than the "promised" 32 bytes, you destroy this linked list and its associated pointer. Therefore, free() has invalid data that it trusts and tries to write in an invalid place, which results in SIGSEGV .

+2
source

Why do you get a segmentation error in free () when overwriting more bytes?

As soon as you pass the end of the space you have flashed, you technically invoke undefined behavior, and therefore anything is possible. In the end, you will be a clobber pointer or size field in the internal data structure, and this damage may or may not cause the link to be wild enough to link to a whole page that does not exist.

That is, a segmentation error is a consequence of page protection. This works well to protect one whole program from another unrelated one and is a tool used by the operating system to limit damage to one user mode address space. This mechanism is not closely synchronized with internal data structures. There is a crude correspondence between valid indexes and actual pages, but nothing more accurate.

and how does the free () algorithm work?

When the block returns malloc (), an internal data structure is created, so when this exact block is passed to free (), it will be recognized and bound to the free list. It will be combined with adjacent free blocks, if possible.

and how to make malloc and track addresses for free?

Since you are using Linux, the source code is available, and reading it will naturally lead to the most accurate answer. However, the general answer is that the directory is stored. This directory can be a separate data structure or it can be in the form of a linked list with metadata stored in front of the actual address that is returned.

Why is space missing there?

This is not entirely in vain. Some space can be used for a directory, and some can be sold for performance, while maintaining blocks aligned across cache boundaries. Create a small block whose size is equal to the size of the cache, or possibly smaller. If this block overlaps the boundary of the cache line, storing it in the cache will require twice as much space as necessary. If this happened everywhere, the cache would be virtually half as small and have a lower speed. (Well, unless you really need neighboring addresses.) Using large blocks will also reduce internal fragmentation , which can actually use less memory in general state in a stationary state, where the calls to malloc () and free () are balanced in lengthy process.

0
source

Source: https://habr.com/ru/post/1411852/


All Articles