There are 3 ways to improve:
- make it more reliable
- optimize memory allocation
- optimize memory allocation code
Make more reliable
There are many common programmer errors that can be easily detected (for example, changing data outside of a highlighted block). For a very simple (and relatively quick) example, you can insert the "canaries" before and after the selected block and find the programmer during free and realloc , checking the correctness of the canary (if they are not then the programmer made a mistake by mistake). It works "sometimes possible." The problem is that (for a simple malloc implementation) metadata is mixed with allocated blocks; so there is a chance that the metadata was corrupted, even if the canary doesn’t. To fix this, it would be nice to separate the metadata from the selected blocks. Also, just the message “something was damaged” doesn't help as much as you hope. Ideally, you would like to have some kind of identifier for each block (for example, the name of the function that allocated it) so that if a problem occurs, a corruption message will appear. Of course, this can and should be done through. macro so that these identifiers can be omitted when not debugged.
The main problem is that the interface provided by malloc is lame and broken - there simply is no way to return acceptable error conditions ("failed to isolate" is the only error that it can return), and there is no way to transmit additional information. You need something more like int malloc(void **outPointer, size_t size, char *identifier) (with similar changes to free and realloc so that they can return the status code and identifier).
Optimized memory allocation
It is naive to assume that all memory is the same. Not this. Cache terrain (including TLB terrain) and other cache effects, and things like NUMA optimization, matter. For a simple example, imagine that you are writing an application that has a structure describing a person (including a hash of their name) and a pointer to a string of the person’s name; and both the structure and the name string are distributed through. taNos. The normal end result is that these structures and strings end up mixing together on the heap; so when you look at these structures (for example, try to find a structure containing the correct hash), you end up banging caches and TLBs. To optimize this, you must ensure that all structures are on the heap. To do this, malloc requires a difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (for example, where everything in the "memory pool number 1" is stored on the heap).
Another important optimization includes cache coloring (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it is important to know the difference between something, where max. bandwidth required (where using memory from multiple NUMA domains increases throughput).
Finally, it would be nice (to manage heap fragmentation, etc.) to use different strategies for “temporary, most likely soon released” allocations and long-term allocations (for example, when you need to do extra to minimize fragmentation and wasted space / RAM )
Note. I would appreciate that getting all of this right could mean that in some cases the software runs 20% faster, due to a much smaller number of cache misses, more bandwidth where necessary, etc.
The main problem here is that the interface provided by malloc is lame and broken - there is simply no way to pass additional information to malloc in the first place. You need something more like int malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags) (with similar changes to realloc ).
Optimize memory allocation code
Considering that you can assume that memory is used more often than its allocation; this is the least important (for example, less important than properly using objects such as cache localization for allocated blocks).
Honestly, anyone who really wants decent performance or decent debugging should not use malloc to get started - common solutions to specific problems are never perfect. With that in mind (and not forgetting that the malloc interface is lame and broken and prevents everything that is important anyway), I would recommend just not to worry with malloc and create something really good (but non-standard) instead. you can adapt the algorithms used by existing malloc implementations.