What is the next step to improve the malloc () algorithm?

I am writing my own simple malloc() function, and I would like to create a faster and more efficient version. I use a function that uses linear search and is allocated sequentially and contiguously in memory.

What is the next step to improve this algorithm? What are the main flaws of my current version? I would be very grateful for any feedback and recommendations.

 typedef struct heap_block { struct heap_block* next; size_t size; bool isfree; }header; #define Heap_Capacity 100000 static char heap[Heap_Capacity]; size_t heap_size; void* malloc(size_t sz) { if(sz == 0 || sz > Heap_Capacity) { return NULL; } header* block = (header*)heap; if(heap_size == 0) { set_data_to_block(block, sz); return (void*)(block+1); } while(block->next != NULL) { block = block->next; } block->next = (header*)((char*)to_end_data(block) + 8); header* new_block = block->next; set_data_to_block(new_block, sz); return (void*)(new_block+1); } void set_data_to_block(header* block, size_t sz) { block->size = sz; block->isfree = false; block->next = NULL; heap_size += sz; } header* to_end_data(header* block) { return (header*)((size_t)(block+1) + block->size); } 
+5
source share
2 answers

Note that malloc often created on top of system calls associated with lower memory levels (e.g. mmap (2) on Linux). See this answer for a mention of GNU glibc and musl-libc . Look also inside tcmalloc , so check out the source code of several versions of malloc free software.

Some general ideas for your malloc :

  • remove memory from the OS using mmap (and release it back to the OS kernel using munmap ). Of course, you should not allocate a bunch of fixed size (since on a 64-bit computer with 128 GB of RAM, you would like to succeed in malloc with an area of ​​10 billion bytes in size).
  • allocate small allocations from large ones, so handle malloc differently from 16 bytes from malloc megabytes. The typical threshold between a small and a large distribution is usually a small multiple of the page size (which is often 4 kilobytes). Small highlighting occurs inside the pages. Large selections are rounded down to pages. You can even handle very specifically malloc two words (as in many linked lists).
  • round the requested size to some bizarre number (for example, power 2 or 3 times power 2).
  • manage memory zones of the same size, the same "fantasy" size.
  • for small memory zones, avoid restoring the memory zone too early, so keep the previously free -d zones of the same (small) size to reuse them in future malloc calls.
  • you can use some tricks on the address (but your system could ASLR ) or keep a meta word next to each memory zone - data describing the piece of which it is a member.
  • A significant problem is, given some address previously returned by malloc and the free argument, to determine the allocated size of this memory zone. You can manipulate the address bits, you could save this size in the word earlier, you could use some hash table, etc. The details are complex.

Note that the details are complex, and it can be quite difficult to write a malloc implementation better than your system. In practice, writing a good malloc is not an easy task. You should find many scientific articles on this subject.

See also garbage collection methods . Think, perhaps, Conservative GC Boehm : you replace malloc with GC_MALLOC , and you will not worry about free ... Learn about memory pools .

+4
source

There are 3 ways to improve:

  • make it more reliable
  • optimize memory allocation
  • optimize memory allocation code

Make more reliable

There are many common programmer errors that can be easily detected (for example, changing data outside of a highlighted block). For a very simple (and relatively quick) example, you can insert the "canaries" before and after the selected block and find the programmer during free and realloc , checking the correctness of the canary (if they are not then the programmer made a mistake by mistake). It works "sometimes possible." The problem is that (for a simple malloc implementation) metadata is mixed with allocated blocks; so there is a chance that the metadata was corrupted, even if the canary doesn’t. To fix this, it would be nice to separate the metadata from the selected blocks. Also, just the message “something was damaged” doesn't help as much as you hope. Ideally, you would like to have some kind of identifier for each block (for example, the name of the function that allocated it) so that if a problem occurs, a corruption message will appear. Of course, this can and should be done through. macro so that these identifiers can be omitted when not debugged.

The main problem is that the interface provided by malloc is lame and broken - there simply is no way to return acceptable error conditions ("failed to isolate" is the only error that it can return), and there is no way to transmit additional information. You need something more like int malloc(void **outPointer, size_t size, char *identifier) (with similar changes to free and realloc so that they can return the status code and identifier).

Optimized memory allocation

It is naive to assume that all memory is the same. Not this. Cache terrain (including TLB terrain) and other cache effects, and things like NUMA optimization, matter. For a simple example, imagine that you are writing an application that has a structure describing a person (including a hash of their name) and a pointer to a string of the person’s name; and both the structure and the name string are distributed through. taNos. The normal end result is that these structures and strings end up mixing together on the heap; so when you look at these structures (for example, try to find a structure containing the correct hash), you end up banging caches and TLBs. To optimize this, you must ensure that all structures are on the heap. To do this, malloc requires a difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (for example, where everything in the "memory pool number 1" is stored on the heap).

Another important optimization includes cache coloring (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it is important to know the difference between something, where max. bandwidth required (where using memory from multiple NUMA domains increases throughput).

Finally, it would be nice (to manage heap fragmentation, etc.) to use different strategies for “temporary, most likely soon released” allocations and long-term allocations (for example, when you need to do extra to minimize fragmentation and wasted space / RAM )

Note. I would appreciate that getting all of this right could mean that in some cases the software runs 20% faster, due to a much smaller number of cache misses, more bandwidth where necessary, etc.

The main problem here is that the interface provided by malloc is lame and broken - there is simply no way to pass additional information to malloc in the first place. You need something more like int malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags) (with similar changes to realloc ).

Optimize memory allocation code

Considering that you can assume that memory is used more often than its allocation; this is the least important (for example, less important than properly using objects such as cache localization for allocated blocks).

Honestly, anyone who really wants decent performance or decent debugging should not use malloc to get started - common solutions to specific problems are never perfect. With that in mind (and not forgetting that the malloc interface is lame and broken and prevents everything that is important anyway), I would recommend just not to worry with malloc and create something really good (but non-standard) instead. you can adapt the algorithms used by existing malloc implementations.

+4
source

All Articles