I want to learn the theory of garbage collection. How should I do it?
I am also interested in garbage collection (to the point that I wrote my own garbage collection virtual machine called HLVM ). I learned to read as many research papers on garbage collection as I could, and play with ideas myself, both in my virtual machine and in writing safe high-level simulations.
The obvious answer is a compiler tutorial ... The question is, do you need to study lexical analysis, analysis and other things that usually precede garbage collection in the text?
Lexical analysis, parsing and other things are not related to garbage collection. You can get an outdated overview of garbage collection from the compilers book, but you need to read research papers to get a modern look, for example, regarding multicore.
In short, what are the prerequisites for studying the theory of garbage collection?
You should be aware of basic graph theory, pointers, stacks, threads, and (if you are interested in multithreading) low-level concurrency primitives such as locks.
Garbage collection is all about determining reachability. When a program can no longer get a reference to a value because that value has become unavailable, then the GC can restart the memory that this value occupies. Reachability is determined by traversing the heap, starting with a set of "global roots" (global variables and pointers in stream stacks and in general registers)
GC design has many aspects, but you can start with four basic garbage collection algorithms:
- Mark-and-sweep (McCarthy, 1960)
- Mark and the Compact (Haddon and Waite, 1967)
- Stop-and-copy (Cheney, 1970)
- Mark Region (McKinley et al., 2007)
Perhaps the most noticeable evolution of these basic ideas is the generational garbage collection, which has been the de facto standard for many years.
According to my personal feelings, some of the little-known garbage collection works contain as much useful information that I also highly recommend:
- Baker treadmill (beautiful GC in real time).
- VCGC (a completely different scheme of three-color marking).
You can also study the three types of recording barriers (Dijkstra, Steel and Yuasy) and take a look at the marking of cards and memorable tricks of the set.
Then you may also need to study the actual design decisions that some developers have chosen for language implementations, such as Java and .NET, as well as the SML / NJ compiler for Standard ML, the OCaml compiler, the Glasgow Haskell compiler, and others. The differences between the accepted methods are as great as the similarities between them!
There are also some great tangential link articles, such as Henderson Accurate Garbage Collection in a non-working environment . I used this technique to avoid writing a stack for HLVM.
Memorymanagement.org is an invaluable resource, especially a glossary of GC related terms.