Why do I need a replacement phase on a parallel GC

Question

Why do I need a replacement phase on a parallel GC

Parallel GC needs a remark phase . The remark phase role is to mark modified objects during the concurrent mark phase . But I think that if we mark only created objects during concurrent mark phase , there is no need to execute remark phase .

remark phase is required due to modified objects. Modification can be of two types. One is creating a new object, and the other is a modified pointer to another object. A new task of an object can be solved easily if we mark the newly created objects. And a modified pointer to another object is not really a problem. Because

Dead object can't revive

A dead object means that no one can specify this object. How can they be reborn? Therefore, the modified pointer must point to already marked objects. This means that there is no need to perform remark .

Someone may say that "marking a new object when creating it is too expensive, so they cannot be marked during the concurrent mark phase and that the reason why the remark phase is required". That seems reasonable. But this may raise another question. How remark not to cross all objects from the root? If the remark phase should cross all the objects from the root, the work performed by the concurrent mark phase is useless. Or, if the remark phase moves only the changed objects, information about which object was changed should be saved somewhere. I think it can be much more expensive than just labeling.

Am I really wrong? This must be wrong. But I have no idea which question is wrong.

+7

java garbage-collection concurrency

Joffrey Apr 24 '15 at 11:14

source share

2 answers

The GC should always look at every saved link that has been changed since the beginning of the current cycle, since it is possible that the link contained somewhere where the GC has not yet been viewed will be stored in the place that it viewed and deleted from its original location. A parallel GC could revise the modified links without stopping the world, but as long as it did, other threads would probably continue to modify more links.

If 10% of the objects are changed when the GC performs a full scan, it may be worth visiting this 10% without stopping the world, but while it visits these 10% of other flows, it may alarm 3%. Visiting this 3% at the same time may be worth it, but other threads may upset 2%. Further omissions may reduce the amount of work that will need to be done in stop-world mode, but the GC is unlikely to complete completely, while other threads are still in the process of breaking links. Unless all threads spontaneously stop doing anything with links, the GC will never be able to complete 100% without stopping the world.

Please note that it may be possible for the GC project to commit to never stopping the world for more than a certain time, possibly due to starving new distribution requests. There is no way for the Civil Code to know with certainty how long it will take to complete the cycle, since until the cycle is completed, it will not be possible to find out if there is still some unopened reference to the root of the large collection as -install unopened objects. On the other hand, the GC can stop the world if it reaches a point where - in the absence of such discoveries - it will wait for completion within 1 ms, but then let the world reboot if new discoveries make it take longer than 2 ms. If you restart the world, it will be necessary to stop the world before the GC can free any memory, and it would be very difficult to guarantee that no combination of events can lead to an endless cycle of aborted attempts of “final cleaning”, but if the permissible “stop world time "reasonable regarding the amount of" churning "of the code with links, failed final cleanups should be rare and repeated in exceptional cases.

0

supercat Oct 19 '16 at 14:34

source share

maaartinus · Accepted Answer · 2015-04-25T01:29:25+0000

And a modified pointer to another object is not really a problem. Because the
Dead object can't revive

They really can't help but know which objects are dead? Not! Why?

You do not know this after the initial mark phase, because you look only at the stacks of the streams and do not follow the links.

You do not know if, after the phase of the parallel sign, the following can happen:

The thread reads the ax field and stores its value in its register (either on its stack or elsewhere).
Then this thread sets ax = null (or something else).
GK comes and sees there null .
The thread then restores ax to its previous value.

Now GC skipped the ax object. Although the above scenario is not very common, it can happen, and there are more realistic (and more complex) scenarios.

So, you need to look again at the modified memory, which is the phase of the remark. Fortunately, not all memory needs to be scanned again since the map table is used.

I'm afraid this (otherwise nice) explanation is a little misleading at this point:

The remark stage is a stop world. CMS cannot correctly determine which objects are alive (flag them live) if the application is running at the same time and continues to change what is alive.

Streams change what lives, but they also change what you can see as living. And this is the problem.

This article talks about this quite clearly:

Part of the work in the remark phase involves rescanning objects that have been changed by the application stream (i.e., looking at object A to see if A has been changed by the application stream, so that A now refers to another object B and B was not previously marked as live).

I would say: when you look for one room after another, you can skip your glasses when children move them.

Script note

I am sure the above scenario is possible, it’s just not quite what the program usually does. For a fairly realistic example, consider

 void swap(Object[] a, int i, int j) { Object tmp = a[i]; a[i] = a[j]; // Now the original reference a[i] is in a register only. a[j] = tmp; }

Why do I need a replacement phase on a parallel GC

Script note

More articles: