How can I debug an internal error in .NET Runtime?

I am trying to debug some work processing large files. The code itself works, but there are sporadic errors from the .NET Runtime itself. For context, the processing here is a 1.5 GB file (only once in memory), which is processed and freed up in a loop, deliberately trying to reproduce this unpredictable error.

My test snippet is basically:

try { byte[] data =File.ReadAllBytes(path); for(int i = 0 ; i < 500 ; i++) { ProcessTheData(data); // deserialize and validate // force collection, for tidiness GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); GC.WaitForPendingFinalizers(); } } catch(Exception ex) { Console.WriteLine(ex.Message); // some more logging; StackTrace, recursive InnerException, etc } 

(with some points and other things that were thrown)

The loop will be handled perfectly for a non-deterministic number of iterations completely successfully - no problems at all; then the process will stop abruptly. The exception handler is not caught. The test really requires a lot of memory use, but it saw very well during each iteration (there is no obvious memory leak, and I have a lot of safety margin - 14 GB of unused primary memory at the worst point in the saw). This process is 64-bit.

The Windows error log contains 3 new entries that (via exit code 80131506) indicate an Engine Engine error - an unpleasant little treadter. The answer, suggests a GC bug with a “fix” to disable the parallel GC; however, this “fix” does not prevent the problem.

Clarification: this low-level error does not fall into the CurrentDomain.UnhandledException event.

Clarification: GC.Collect exists only to monitor the saw blade memory, to check memory leaks and maintain predictability; deleting it does not fix the problem: it just makes more storage between iterations and makes dmp files bigger, p

By adding more console tracing, I noticed that this is an error during each of:

  • during deserialization (many distributions, etc.)
  • during the GC (between the GC approach) and the GC "complete" using the GC notification API)
  • during the verification (just foreach according to some data) - curious immediately after the GC terminates during the verification

So many different scenarios.

I can get crash-dump (dmp) files; how can I explore this further to see what the system does when it fails so spectacularly?

+61
c #
Jan 09 '13 at 15:28
source share
6 answers

If you have memory dumps, I would suggest using WinDbg to look at them, assuming you are not already doing this.

Try to start the comment !EEStack (mixed tracing using its own and managed stacks) and see if there is anything that can pop up in the stack trace. In my test program, I found this once as a stack trace where FEEE occurred (I deliberately distorted the heap):

 0: 000>! EEStack
 ---------------------------------------------
 Thread 0
 Current frame: ntdll! NtWaitForSingleObject + 0xa
 Child-SP RetAddr Caller, Callee
 00000089879bd3d0 000007fc586610ea KERNELBASE! WaitForSingleObjectEx + 0x92, calling ntdll! NtWaitForSingleObject
 00000089879bd400 000007fc5869811c KERNELBASE! RaiseException + 0x68, calling ntdll! RtlRaiseException
 [...]
 00000089879bec80 000007fc49109cf6 clr! WKS :: gc_heap :: gc1 + 0x96, calling clr! WKS :: gc_heap :: mark_phase
 00000089879becd0 000007fc49109c21 clr! WKS :: gc_heap :: garbage_collect + 0x222, calling clr! WKS :: gc_heap :: gc1
 00000089879bed10 000007fc491092f1 clr! WKS :: GCHeap :: RestartEE + 0xa2, calling clr! Thread :: ResumeRuntime
 00000089879bed60 000007fc4910998d clr! WKS :: GCHeap :: GarbageCollectGeneration + 0xdd, calling clr! WKS :: gc_heap :: garbage_collect
 00000089879bedb0 000007fc4910df9c clr! WKS :: GCHeap :: Alloc + 0x31b, calling clr! WKS :: GCHeap :: GarbageCollectGeneration
 00000089879bee00 000007fc48ff82e1 clr! JIT_NewArr1 + 0x481

Since this may be due to damage to the heap from the garbage collector, I would try the !VerifyHeap . At the very least, you can make sure that the heap is not damaged (and your problem lies elsewhere) or find that your problem may be related to the GC or some P / Invoke procedures that decompose it.

If you find that the heap is damaged, I can try to figure out what part of the heap is damaged, which you can do with !HeapStat . It can just show the whole bunch of corrupt ones from a certain point, however.

It is difficult to suggest any other methods for analyzing this through WinDbg, since I do not know what your code does and how it is structured.

I suggest that if you find this to be a heap problem, and therefore it means that it could be GC weirdness, I would look at the GC CLR events in the event trace for Windows.




If the mini-disks you get do not cut it and you use Windows 7 / 2008R2 or later, you can use Global Flags (gflags.exe) to attach the debugger when the process completes without exception, if you do not receive a notification Wer.

On the Silent Process Exit tab, enter the name of the executable file, not the full path to it (i.e. TestProgram.exe ). Use the following settings:

  • Check "Enable output monitoring without sound"
  • Check the launch process
  • For the monitoring process, use {path to debugging tools}\cdb.exe -server tcp:port=5005 -g -G -p %e .

And apply the settings.

When your test program crashes, cdb will connect and wait until you connect to it. Run WinDbg, enter Ctrl + R and use the connection string: tcp:port=5005,server=localhost .

You might be able to skip using remote debugging and use {path to debugging tools}\windbg.exe %e . However, the reason I suggested the remote option was because WerFault.exe , which I believe is reading the registry and starting the monitoring process, will start the debugger in session 0.

You can make session 0 interactive and connect to the window station, but I don’t remember how it was done. This is also inconvenient because you have to switch between sessions if you need to access any of your existing windows that you have opened.

+21
Jan 16 '13 at 0:02
source share

Tools->Debugging->General->Enable .Net Framework Debugging

+

Tools->IntelliTace-> IntelliTaceEbents And Call Information

+

Tools->IntelliTace-> Set StorIntelliTace Recordings in this directory

and select a directory

should allow you to enter the INTO.net code and track every function call. I tried this on a small project example and it works

after each debugging session, it is supposed to create a debugging session record. this is the given directory even if the CLR dies if im is not mistaken

this should allow you to jump to a verbose call before smoothing the CLR.

+7
Jan 09 '13 at 16:20
source share

Try writing a generic exception handler and see if there is an unhandled exception killing your application.

  AppDomain currentDomain = AppDomain.CurrentDomain; currentDomain.UnhandledException += new UnhandledExceptionEventHandler(MyExceptionHandler); static void MyExceptionHandler(object sender, UnhandledExceptionEventArgs e) { Console.WriteLine(e.ExceptionObject.ToString()); Console.WriteLine("Press Enter to continue"); Console.ReadLine(); Environment.Exit(1); 
+3
Jan 09 '13 at 15:37
source share

I usually cause memory issues with Valgrind and gdb.

If you run your stuff on Windows, there are many good alternatives, such as verysleepy for callgrind, as suggested here:
Is there a good replacement for Valgrind for Windows?

If you really want to debug internal .NET runtime errors, you have a problem that there is no source for the class libraries or the virtual machine.

Since you cannot debug what you do not have, I suggest (besides decompiling the .NET Framework libraries in the question using ILSpy and adding them to your project, which still does not apply to vm), you can use a mono executable file.
There you have both the source of the class libraries and the virtual machine.
Perhaps your program works fine with mono, then your problem will be solved, at least until it is only a one-time task.

If not, there is an extensive debugging FAQ, including GDB support
http://www.mono-project.com/Debugging

Miguel also has this post regarding valgrind support:
http://tirania.org/blog/archive/2007/Jun-29.html

In addition to this, if you let it work on Linux, you can also use strace to find out what happens in system calls. If you do not have extensive use of winforms or WinAPI calls, .NET programs usually work fine on Linux (for issues related to file system sensitivity, you can integrate the file system case-insensitively and / or use MONO_IOMAP ).

If you are Windows-oriented, this post says that the closest thing to Windows is WinDbg Logger.exe, but ltrace information is not so extensive.

Mono source code is available here:
http://download.mono-project.com/sources/

You are probably interested in the sources of the latest mono version.
http://download.mono-project.com/sources/mono/mono-3.0.3.tar.bz2

If you need 4.5 framework, you will need mono 3, you can find precompiled packages here
https://www.meebey.net/posts/mono_3.0_preview_debian_ubuntu_packages/

If you want to make changes to the source code, here's how to compile it:
http://ubuntuforums.org/showthread.php?t=1591370

+3
Feb 13 '13 at 21:51
source share

There are .NET exceptions that cannot be caught. Check out: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx .

+1
Jan 15 '13 at 21:27
source share

For errors of this nature, which are non-deterministic and non-predictive, analyzing a dump of faulty windbg data is one of the most important analysis mechanisms, please check the following links as they take the details of debugging windbg:

http://www.debuginfo.com/articles/easywindbg.html

http://www.debuginfo.com/articles/easywindbg2.html

Check out these information slides related to windbg debugging:

http://www.slideshare.net/ShanmugaSundaram12/crash-dump-analysisshanmugasundaram

As you can see from the above details for the correct analysis of the crash dumps that you will receive using the adplus failover switch, the most important aspect is the correct character or pdb files, since they will help in comparing with the current function calls on the hexadecimal stack and provide critical information about the method executed before the generation of the failure / AV. The characters are taken from the _NT_SYMBOL_PATH environment variable. In Windbg, you don’t need command line tools, the visual interface is good enough to display all the information about the stack trace during an error.

As I understand it, you have already tried to include exceptions in VS, including it is preferable to include all of them in the Exception dialog box, since this is always the first debugging point that can give critical information if it breaks with a specific exception, so the first hint. Windbg always follows him to gain a deeper understanding of the problem and what is the most famous Windows tool.

However, my opinion will be different from the problem, as I see that the program includes matching a stream of bytes with a huge working set at runtime, so to avoid this problem, you can try the following:

- Create smaller pieces of memory and process them, this will ensure that if an error occurs due to the sudden pressure of the memory and the display of a large working set, the GC will have more options to collect memory and reduce the total memory pressure. if 1.5 GB can be divided into 3-5 small pieces (500 MB - 300 MB). In this case, you can either do it by reading the file in parts, looking inside or after reading the byte stream, you can split it into a smaller byte [] to deserialize and aggregate the final result. I saw how I practically saw it, taking care of many such problems.

  • As you would say that GC calls are not made, but you certainly know that even if GC calls do not guarantee any deterministic behavior, the GC is invoked on its own anyway, just to ensure that the execution expects the GC to execute after the iteration of the program, however, it seems that the current data for processing is already high enough to sporadically reproduce the problem.

In case you want to look at the pattern of using the memory of your process, because sometimes an ever-increasing working set / virtual bytes can be the root of the problem, then I have a message for debugging an exception from memory:

When I use Socket.IO, why did I get the error Unhandled exception like "System.OutOfMemoryException"

In this case, this may be a turning point immediately before the OOM, but if you find that with a preliminary analysis of the task manager, the memory size increases non-stop, then you may need to continue to consider the problem.

In addition, although I do not know the system settings, you can play with the boot configuration switches in windows such as / 3GB, / USERVA to set the user process memory to a higher value, which is enough to avoid such a problem, although this will require manual analysis to understand a certain point in memory pressure when it will undoubtedly lead to an error

0
Oct 02 '14 at 8:23
source share



All Articles