ThreadLocks and static constructors

Given:

ASP.net Web api application hosted in IIS. An application creates about 30 application domains for a plugin that does some external work.

The application serves a lot of users and works most of the time very well, but several times (after several days or even weeks) a sudden freeze occurs.

Problem:

One web application sometimes freezes, which leads to the need to restart w3wp.exe.

After some research of dumps in this state, we found out that at this moment there are many flows (sometimes around 15,000).

In normal order, we never observe more than a hundred flows.

DebugDiag says there is one thread blocking the others

enter image description here

Now we saw that in thread 44 (and many others, about 90%) at the end the same call:

enter image description here

The method itself has no lock or thread function. But he has one unusual thing regarding his static constructor. Ctor looks like this:

static TimeZoneHelper() { using (StringReader reader = new StringReader(Resources.TimeZones)) { string line; while ((line = reader.ReadLine()) != null) { string[] parts = line.Split(';'); TimeZoneInfo timeZone = TimeZoneInfo.FindSystemTimeZoneById(parts[1]); timeZones[parts[0]] = timeZone; } } } 

In addition, a debug analysis shows that the application was in active gc (and, as you might ask: we never run gc.collect manually) enter image description here

Question Is there any evidence that this type of code is problematic in static ctor? Even if there is no task or thread code? Perhaps something is related to the GC process itself (since the object is one-time, even if it does not have dispose code?)

TimeZoneHelper

I created an essence containing the main methods of this class by introducing ctor and the method that TimeZoneHelper.ToTimeZoneOffset called:

https://gist.github.com/Gentlehag/9d564555261da0e73366

The main thing the Dictionary.TryGet method leads to (which was created in ctor)


Edit Btw I also want to add that in every appdomain an assembly build event is associated. The code can be seen here:

https://gist.github.com/Gentlehag/4726b6d888adb149684d


Important update I'm a colleague and just want to add more information. We also found another scenario that is very similar. I have a stacktrace from the thread that owns the block:

 000000c898897560 00007ff8855b7e5d System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon) 000000c8988975d0 00007ff8855b7d34 System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].TryGetValue(System.__Canon, System.__Canon ByRef) 000000c898897610 00007ff88f6152b3 GP.Components.Extensions.AppDomains.RemotingRunner.CurrentDomain_AssemblyResolve(System.Object, System.ResolveEventArgs) 000000c8988978a0 00007ff886f7276c System.AppDomain.OnAssemblyResolveEvent(System.Reflection.RuntimeAssembly, System.String) 000000c898897bd0 00007ff8e4b2a7f3 [GCFrame: 000000c898897bd0] 000000c898899b78 00007ff8e4b2a7f3 [HelperMethodFrame_PROTECTOBJ: 000000c898899b78] System.Reflection.RuntimeAssembly._nLoad(System.Reflection.AssemblyName, System.String, System.Security.Policy.Evidence, System.Reflection.RuntimeAssembly, System.Threading.StackCrawlMark ByRef, IntPtr, Boolean, Boolean, Boolean) 000000c898899c80 00007ff886f7224e System.Reflection.RuntimeAssembly.InternalGetSatelliteAssembly(System.String, System.Globalization.CultureInfo, System.Version, Boolean, System.Threading.StackCrawlMark ByRef) 000000c898899d60 00007ff886f716c8 System.Resources.ManifestBasedResourceGroveler.GetSatelliteAssembly(System.Globalization.CultureInfo, System.Threading.StackCrawlMark ByRef) 000000c898899df0 00007ff885b932fb System.Resources.ManifestBasedResourceGroveler.GrovelForResourceSet(System.Globalization.CultureInfo, System.Collections.Generic.Dictionary`2, Boolean, Boolean, System.Threading.StackCrawlMark ByRef) 000000c898899eb0 00007ff885b92ecb System.Resources.ResourceManager.InternalGetResourceSet(System.Globalization.CultureInfo, Boolean, Boolean, System.Threading.StackCrawlMark ByRef) 000000c898899fa0 00007ff885b92b73 System.Resources.ResourceManager.InternalGetResourceSet(System.Globalization.CultureInfo, Boolean, Boolean) 000000c898899ff0 00007ff885b92014 System.Resources.ResourceManager.GetString(System.String, System.Globalization.CultureInfo) 000000c89889a0a0 00007ff89914aa62 NewRelic.Agent.Core.Config.ConfigurationLoader.InitializeFromXml(System.String, System.String) 000000c89889a140 00007ff89914a838 NewRelic.Agent.Core.Config.ConfigurationLoader.Initialize(System.String) 000000c89889a1a0 00007ff899143be9 NewRelic.Agent.Core.Config.ConfigurationLoader.Initialize() 000000c89889a210 00007ff899123a27 NewRelic.Agent.Core.Agent+AgentSingleton.CreateInstance() 000000c89889a280 00007ff8991239c2 NewRelic.Agent.Core.Singleton`1[[System.__Canon, mscorlib]]..ctor(System.__Canon) 000000c89889a2c0 00007ff89912388b NewRelic.Agent.Core.Agent..cctor() 000000c89889a700 00007ff8e4b2a7f3 [GCFrame: 000000c89889a700] 000000c89889ce88 00007ff8e4b2a7f3 [PrestubMethodFrame: 000000c89889ce88] NewRelic.Agent.Core.Agent.get_Instance() 000000c89889cef0 00007ff89912358c NewRelic.Agent.Core.AgentShim.GetTracer(System.String, UInt32, System.String, System.String, System.Type, System.String, System.String, System.String, System.Object, System.Object[]) 000000c89889d280 00007ff8e4b2a7f3 [DebuggerU2MCatchHandlerFrame: 000000c89889d280] 

This is not about the TimeZoneHelper class, but it’s interesting that there is a common aspect: both classes load the resource into their static constructor (either the configuration file for NewRelic or the file with time zones). So the scenario is as follows:

  • Multiple threads try to use a class
  • The first thread gets a lock for the static constructor and starts this constructor
  • The resource is loading, and the .NET runtime attempts to load the resource assembly.
  • We catch the AssemblyResolve event to load the assembly of the resource and, to some extent, cause a dead end, the question is how?
+6
source share
1 answer

Here is my hunch about what's going on.

UPDATE: I think this is a recursion problem with the AssemblyResolve event. Based on the comments, the stack overflow did not occur, but the recursion problem can still be resolved, so the answer is still applicable.

There is an indication that this error depends on the order of access to resources. Most likely, this happens when the first one is access to one of the static classes that you mentioned.

The first time you access a resource, the AssemblyResolve event fires several times. Subsequent resource requests do not result in AssemblyResolve events. This can be demonstrated by the following code:

 AppDomain.CurrentDomain.AssemblyResolve += (sender, eventArgs) => { Console.WriteLine("Resolve {0}", eventArgs.Name); return null; }; Console.WriteLine(Resource1.String1); Console.WriteLine(Resource1.String1); 

Result:

 Resolve ConsoleApplication1.resources, Version=1.0.0.0, Culture=ru-RU, PublicKeyToken=null Resolve ConsoleApplication1.resources, Version=1.0.0.0, Culture=ru-RU, PublicKeyToken=null Resolve ConsoleApplication1.resources, Version=1.0.0.0, Culture=ru, PublicKeyToken=null Resolve ConsoleApplication1.resources, Version=1.0.0.0, Culture=ru, PublicKeyToken=null Value from resource Value from resource 

The registrar refers to the resources, and this is indicated by:

 000000c898899ff0 00007ff885b92014 System.Resources.ResourceManager.GetString(System.String, System.Globalization.CultureInfo) 000000c89889a0a0 00007ff89914aa62 NewRelic.Agent.Core.Config.ConfigurationLoader.InitializeFromXml(System.String, System.String) 000000c89889a140 00007ff89914a838 NewRelic.Agent.Core.Config.ConfigurationLoader.Initialize(System.String) 000000c89889a1a0 00007ff899143be9 NewRelic.Agent.Core.Config.ConfigurationLoader.Initialize() 000000c89889a210 00007ff899123a27 NewRelic.Agent.Core.Agent+AgentSingleton.CreateInstance() 000000c89889a280 00007ff8991239c2 NewRelic.Agent.Core.Singleton`1[[System.__Canon, mscorlib]]..ctor(System.__Canon) 000000c89889a2c0 00007ff89912388b NewRelic.Agent.Core.Agent..cctor() 000000c89889a700 00007ff8e4b2a7f3 [GCFrame: 000000c89889a700] 000000c89889ce88 00007ff8e4b2a7f3 [PrestubMethodFrame: 000000c89889ce88] NewRelic.Agent.Core.Agent.get_Instance() 000000c89889cef0 00007ff89912358c NewRelic.Agent.Core.AgentShim.GetTracer(System.String, UInt32, System.String, System.String, System.Type, System.String, System.String, System.String, System.Object, System.Object[]) 

My conclusion is that a logger can be successfully launched without the AssemblyResolve of any associated event for the first time and will never raise an AssemblyResolve event if it is fired for the first time this way.

If this is your first time accessing a resource from AssemblyResolve, a recursive call occurs, which raises a StackOverflowException. It is easy to simulate:

 AppDomain.CurrentDomain.AssemblyResolve += (sender, eventArgs) => { Console.WriteLine("Resolve {0}", eventArgs.Name); Console.WriteLine(Resource1.String1); return null; }; Console.WriteLine(Resource1.String1); 

And there is a Logger call:

 catch { context.RunnerLog.Error(string.Format(CultureInfo.InvariantCulture, "Failed to load assembly {0}.", args.Name)); result = null; } 

There may be a difference if the log was initialized before the AssemblyResolve event was connected, or another condition occurred that did not cause the registrar to trigger a failed AssemblyResolve event.

When you start with a call to a static class and have an exception in AssemblyResolve, and you have to catch it and register, a call to the log calls access to the resource and the other calls another build solution, and this recursion leads to a stack overflow.

While the first request has a lock on the static class constructor, if this operation took a long time before a StackOverflowException, other requests are blocked, but it does not matter because they fail with a TypeInitializationException. The latter will never happen, because the domain will still start unloading after a StackOverflowException.

The fact that he shows some sort of dictionary search method from above does not matter either - this is probably the last straw that contributed to the stack overflow.

One thing I would recommend using a different kind of logger inside AssemblyResolve event handlers.

Another thing is that I will try to avoid blocking I / O requests in static constructors, such as access to resources or manual assembly. Just initialize the main material inside and use another concurrency mechanism for lazy initialization in the public methods themselves.

However, I do not think that the reason for the suspicious stackoverflow is related to static constructors.

Also, it cannot be suspicious if the recursion is too slow for a stackoverflow situation to occur. Thus, the domain can start offloading for other reasons - for example, using some IIS resource protection tool, such as the number of threads or the consumption of shared memory. This is likely to happen if the requests last a long time.

+1
source

All Articles