Pointer arithmetic is not a major issue. GC has to deal with reassigning pointers all the time, and pointer arithmetic is another example of this. (Of course, if the arithmetic of pointers between pointers pointing to different buffers were specified, this would create problems, but it is not. The only arithmetic that you allow to perform on a pointer pointing to array A is those that reposition it inside of this array.
The real problem is the lack of metadata. GC needs to know what a pointer is and what not.
If he meets the value 0x27a2c230 , he should be able to determine if it is
- pointer (in this case, it must follow the pointer to mark the destination as "in use" recursively)
- Integer (the same value is a valid integer. Perhaps this is not a pointer)
- or something else, say a little line.
He should also be able to determine the degree of structure. Assuming this value is a pointer and it points to a different structure, the GC should be able to determine the size and extent of this structure, so it knows which address range should be scanned for more pointers.
GC languages ββhave a lot of infrastructure to solve this problem. C ++ does not.
The Boehm GC is the closest you can usually get, and it is conservative in the sense that if something can be a pointer, GC assumes that it is one, which means that some data is uselessly stored alive. That way, it will probably save the data that should be GC'ed.
As an alternative, of course, all this infrastructure can in principle be added to the C ++ compiler. There is no rule in the standard so that it is not allowed to exist. The problem is that it will be a serious blow to productivity and will exclude many optimization opportunities.
jalf
source share