Can I run the GC CLR to expect use of wasteful memory?

Question

Can I run the GC CLR to expect use of wasteful memory?

We have a server application that makes many memory allocations (both short and long-lived). Shortly after launch, we see a lot of GC2 collections, but these collections calm down after a while (although the memory allocation pattern is constant). These collections are in their early stages.

I guess this might be caused by GC budgets (for Gen2?). Is there a way to set this budget (directly or indirectly) so that my server works better at the beginning?

One contra-intuitive set of results that I saw: we significantly reduced the amount of memory (and a large heap of objects) that showed performance in the long run, but early performance worsens, and the Calming Period extends.

GC, apparently, needs a certain period of time to understand that our application is a memory swamp and adapts accordingly. I already know this fact, how can I convince GC?

Edit

OS: 64-bit version of Windows Server 2008 R2
We are using .Net 4.0 ServerGC Batch Latency. We tried 4,5 and 3 different standby modes, and although the average performance was slightly improved, the performance of the worst cases actually worsened.

Edit2

GC surge can double the time (we speak in seconds) from acceptable to unacceptable
Almost all peaks correlate with gen 2 collections.
My test run causes the final heap size of 32 GB. The initial foaminess lasts for the 1st 1/5th period of time, and the performance after that is actually better (less frequent bursts), although the pile is growing. The last tenon closer to the end of the test (with the largest heap size) has the same height as (2) as the spikes in the initial "training" period (with much smaller heaps).

+8

garbage-collection memory-management c # .net

Rob Nov 26 '13 at 10:57

source share

1 answer

Roman polunin · Accepted Answer · 2013-11-26T21:22:51+0000

Allocating an extremely large heap in .NET can be insanely fast, and the number of blocking collections does not prevent it from being so fast. The problems you are observing are caused by the fact that you are not just allocating, but also having code that causes dependency reorganization and the actual garbage collection, at the same time as the distribution occurs.

There are several methods:

try using LatencyMode ( http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx ), set it to LowLatency while you are actively loading data - see . comments on this answer also
use multiple threads
Do not fill in cross-references to newly selected objects during active loading. first they go through the active distribution phase, use only whole indexes for cross-references, but do not control the links; then force full GC pairs to have everything in Gen2, and only then populate your extended data structures; you may need to rethink your deserialization logic for this to happen.
try to force your largest root collections (arrays of objects, strings) to the second generation as early as possible; do this by pre-distributing them and running the full GC two times before you start filling in the data (loading millions of small objects); if you use some kind of aroma of the generic dictionary, do not forget to distribute its capacity in advance to avoid reorganization
any large array of links is a great source of GC overhead - as long as both arrays and reference objects are not in Gen2; the larger the array, the greater the overhead; prefers index arrays to link arrays, especially for temporary processing needs
avoid multilevel or temporary objects freed or promoted , while in the active loading phase on any thread, carefully review your code to concatenate strings, box and foreach iterators, t will automatically be optimized for for loops
If you have an array of links and a hierarchy of function calls that have some long-running narrow loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and continue to use something like the myArrayOfObjects [offset] structure at all levels of your function calls; it helped me with processing pre-populated massive Gen2 data structures, my personal theory here is that it helps the GC manage time dependencies on your local stream data structures, thereby improving concurrency

Here are the reasons for this behavior, as far as I learned due to filling up to 100 GB of RAM when the application starts, with multiple threads:

when the GC moves data from one generation to another, it actually copies it and thus changes all the links; therefore, the less cross-references you have during the active loading phase, the better
GC supports many internal data structures that control links; if you make massive changes to the links themselves - or if you have a lot of links that need to be changed during the GC - this leads to significant overhead of the CPU and memory during blocking and parallel GC; sometimes I watched as the GC constantly consumes 30-80% of the processor without any assemblies - just doing some processing that looks strange until you realize that whenever you put a reference to some array or some a temporary variable in a closed loop, the GC must modify and sometimes reorganize the dependency tracking data structures.
the GC server uses the Gen0 streaming segments associated with the stream and is able to redirect the entire segment to the next Gen (without actually copying data - not sure about this), keep this in mind when developing a multi-threaded data loading process
ConcurrentDictionary, being a great API, does not scale very well in extreme multi-core scenarios, when the number of objects exceeds several millions (think of using an unmanaged hash table optimized for parallel insertion, for example, for Intel T)
if possible or applicable, consider using your own distributed distribution pool (Intel TBB, again)

BTW, the latest update for .NET 4.5, has defragmentation support for a large heap of objects. Another great reason to update it.

.NET 4.6 also has an API that does not request GC at all (GC.TryStartNoGCRegion) if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx

Also see Maoni Stevens related post: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/

Can I run the GC CLR to expect use of wasteful memory?

More articles: