Fast stack stack trace in Windows / 64-bit / mixed mode

Question

Fast stack stack trace in Windows / 64-bit / mixed mode

Like most of you probably know, there are many different mechanisms for tracking stack stacks, starting with the windows api and continuing to delve into the magical world of assembly - let me list some of the links that I already learned.

For everyone, let me mention that I want to have a mixed-mode (managed and unmanaged) / 64-bit + AnyCPU application memory leak analysis mechanism, and of all the windows api CaptureStackBackTrace is most suitable for my needs, but since I analyzed it does not support managed code hosting. But this API function is closest to what I need (since it also calculates hash backtrace - a unique identifier for a specific call stack).

I ruled out different approaches to detecting memory leaks - most of the software I tried either crashes, or does not work unsatisfactorily, or gives poor results.

Also, I don’t want to recompile existing software and redefine malloc / another new mechanism - because it is a difficult task (we have a huge code base with a lot of DLLs). I also suspect that this is not the only time I need to execute - the release returns with a 1-2-year cycle, depending on who and what was encoded, so I would prefer to have built-in detection of memory leaks in the application itself (api memory hooking) instead of fighting this problem over and over again.

http://www.codeproject.com/Articles/11132/Walking-the-callstack

Uses the StackWalk64 window API function, but does not work with managed code. In addition, 64-bit support is not entirely clear to me - I saw some problems with a 64-bit problem - I suspect that this code does not fully work when a stack walk is performed inside a single thread.

Then there is a process hacker: http://processhacker.sourceforge.net/

who also use StackWalk64, but extend its callback function (7th and 8th parameters) to support walking in mixed mode. After many difficulties with the 7/8 forwarding functions, I also managed to achieve StackWalk64 support with mixed mode support (tracking the trace of the stack as a vector - where each pointer refers to the / dll build location where the call took place). But, as you can guess, the performance of StackWalk64 is insufficient for my needs - even with a simple message field on the part of C #, the application simply freezes for a while until it starts correctly.

I have not seen such heavy delays in calling the CaptureStackBackTrace function, so I assume that the performance of StackWalk64 is insufficient for my needs.

There is also a method for determining a stack trace based on COM - for example: http://www.codeproject.com/Articles/371137/A-Mixed-Mode-Stackwalk-with-the-IDebugClient-Inter

http://blog.steveniemitz.com/building-a-mixed-mode-stack-walker-part-1/

but I'm afraid - this requires COM, and the stream must be initialized, and due to memory interception, I should not touch the com state in any thread, because this can lead to more serious problems (for example, incorrect flat initialization, other failures)

Now I have come to the point where the window API is becoming insufficient for my own needs, and I need to go through the call stack manually. Such examples can be found, for example:

http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks See only the FillStackInfo / 32 bit function, does not support managed code.

There are several references to backtracking the stack - for example, at the following links:

Especially 1, 3, 4 links give an interesting night reading. :-)

But even so, they are quite interesting mechanisms; one of them lacks a full-fledged demo.

I assume one example is the implementation of Wine dbghelp (a Windows emulator for Linux), which also shows how StackWalk64 works in the end, but I suspect it is strongly related to the executable DWARF2 file, so it is not identical to the current executable PE.

Can someone substitute me for a good stacking implementation, working on a 64-bit architecture, with support for mixed mode (it can track its own and managed memory allocations), which is exclusively related to the analysis of the stack / register / call code. (Combined implementations 1, 3, 4)

Does anyone have any good contacts from the Microsoft development team who could answer this question?

+6

stack-trace windows memory-leaks mixed-mode

TarmoPikaro Dec 28 '15 at 10:18

source share

6 answers

x64 Walking on glass is difficult, as you already learned. A simple alternative is to simply not do this, but to leave difficult things in the ETW stackwalker OS. It works and it is much faster than you will ever get.

You can take advantage of this by releasing your own ETW event. Before doing this, you need to start an ETW session for the event provider and enable stacking for your provider. Windows 7 has a trick in which it does not work unless managed stack frames are defined, because the x64 ETW Stackwalker will stop if it finds a stack frame that is not in any loaded module that is true for JIT code.

Starting with Windows 8, ETW Stackwalker will always walk the first MB of the stack for stack frames, which fixes the JIT problem. The JIT compiler emits Unwind Infos for the generated code if ETW tracing is enabled and is registered through RtlAddGrowableFunctionTable, which allows you to quickly quickly move the stack from within the kernel. Things work differently when ETW tracing is not enabled for compatibility reasons.

If you are after a memory leak malloc / free new / delete, you can also use the bultin OS features of the heap allocation trace that already exists with Windows 7. See xperf -help start and https://randomascii.wordpress.com/2015/04 / 27 / etw-heap-tracingevery-allocation-recorded / for more information about heap distribution tracing. You can enable it for an already running process without any problems. The disadvantage is that for any real-world application, the generated data is huge. But if you use large distributions only then this can help track only VirtualAlloc calls, which can also be included with minimal overhead.

Managed code, since .NET 4.5 also has its own ETW trace trace provider with a full stack load even on x64 Windows 7, because it itself runs a full managed stack. More information can be found in CoreClr sources at: ETW :: SamplingLog :: SendStackTrace at https://github.com/dotnet/coreclr/blob/master/src/inc/eventtracebase.h for more details.

This is just an approximation of what is possible. To really get all the necessary details, I would be afraid of the whole book. And I still learn new things every day.

Here is a heapalloc.cmd script that you can use to track heap allocations. By default, it writes to the circular buffer of 500 MB, if your leak accumulates over longer periods of time, writing all distribution stacks without condensing them at runtime will not work with WPA. But you can send a huge ETL file process and write your own viewer for it.

 @echo off setlocal enabledelayedexpansion REM consider using a different drive for ETL output to prevent slowing down REM your application and to prevent lost buffers set OUTDIR=C:\TEMP set OUTFILENAME=HeapTracing.etl REM Final output file set OUTFILE=!OUTDIR!\!OUTFILENAME! set CLRUNDOWNFILE=!OUTDIR!\clr_HeapDCend.etl set KERNELFILE=!OUTDIR!\kernel.etl set CLRSESSIONFILE=!OUTDIR!\clrHeapSession.etl set HEAPUSERFILE=!OUTDIR!\HeapUserSession.etl REM Default is allocation and realloc to track memory leaks REM HeapFree is the other option to track double free calls set HEAPTRACINGFLAGS=HeapAlloc+HeapRealloc if "%3" NEQ "" ( echo Overriding Heap Tracing Flags with: %3 set HEAPTRACINGFLAGS=%3 ) if "%1" EQU "-start" ( call :StartTracing -PidNewProcess %2 goto :Exit ) if "%1" EQU "-attachPid" ( call :StartTracing -Pids %2 goto :Exit ) if "%1" EQU "-startNext" ( reg add "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%~nx2" /v TracingFlags /t REG_DWORD /d 1 /f if not %errorlevel% == 0 goto failure call :StartTracing -Pids 0 goto :Exit ) if "%1" EQU "-stop" ( set XPERF_CreateNGenPdbs=1 xperf -start ClrRundownSession -on e13c0d23-ccbc-4e12-931b-d9cc2eee27e4:0x118:5+a669021c-c450-4609-a035-5af59af4df18:0x118:5 -f "!CLRUNDOWNFILE!" -buffersize 256 -minbuffers 256 -maxbuffers 512 call :WaitUntilRundownCompleted "!CLRUNDOWNFILE!" xperf -stop -stop ClrSession ClrRundownSession HeapSession | findstr /V identifiable 2> NUL echo Merging profiles REM Reset symbol path to create the pdbs files in the output directory with in the directory with the same name like our etl file set TMPSYMBOLPATH=!_NT_SYMBOL_PATH! REM Each tool is using a different pdb cache folder. If you are using them side by side REM you have to wait a long time to refresh the pdb cache. To spare the waiting time we use REM the pdb cache folder from WPR mkdir C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache 2> NUL set _NT_SYMBOL_PATH=srv*C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache mklink /D "!OUTFILE!.NGENPDB" C:\ProgramData\WindowsPerformanceRecorder\NGenPdbs_Cache 2> NUL echo Managed PDBs are stored at: !OUTFILE!.NGENPDB. If you want to transfer the etl do not forget to copy this directory with the pdbs as well. echo Merging ETL files and generating native pdbs xperf -merge "!KERNELFILE!" "!CLRSESSIONFILE!" "!CLRUNDOWNFILE!" "!HEAPUSERFILE!" "!OUTFILE!" set _NT_SYMBOL_PATH=!TMPSYMBOLPATH! echo !OUTFILE! was created if "%2" NEQ "" reg delete "HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%~nx2" /v TracingFlags /f 2> NUL goto :Exit ) goto Usage: :StartTracing xperf -start ClrSession -on Microsoft-Windows-DotNETRuntime:5 -f "!CLRSESSIONFILE!" -buffersize 128 -minbuffers 256 -maxbuffers 512 xperf -on PROC_THREAD+LOADER+latency+virt_alloc -stackwalk VirtualAlloc -f "%KERNELFILE%" xperf -start HeapSession -heap %1 %2 -BufferSize 1024 -MinBuffers 128 -MaxBuffers 1024 -stackwalk %HEAPTRACINGFLAGS% -f "!HEAPUSERFILE!" -FileMode Circular -MaxFile 1024 exit /B REM Wait until writing to ETL file has stopped by checking its file size :WaitUntilRundownCompleted :StillWriting for %%F in (%1) do set "size=%%~zF" timeout /T 1 > nul for %%F in (%1) do set "size2=%%~zF" if "!size!" EQU "" goto :EndWriting if "!size!" NEQ "!size2!" goto StillWriting :EndWriting timeout /T 1 > nul exit /B :Usage echo Usage: echo HeapAlloc.cmd -start [executable] or -stop echo -start [executable] Start a trace session echo -startNext [executable] Start heap tracing for all subsequent calls to executable. echo -attachPid ddd Start a trace session for specified process echo -stop [executable] Stop a trace session echo Examples echo HeapAlloc.cmd -startNext devenv.exe echo HeapAlloc.cmd -stop devenv.exe echo To attach to a running process echo HeapAlloc.cmd -attachPid dddd echo HeapAlloc.cmd -stop echo You must call -stop for your executable if you have used -start or startNext because heap allocation tracing will enabled until you stop it! goto :Exit :failure echo Error occured goto :Exit :Exit

0

Alois kraus Dec 29 '15 at 23:36

source share

01/25/2016 Recording as a separate issue as additional information.

For a unique stack identifier, CaptureStackBackTrace uses a simple sum of all command pointers - the idea is borrowed from: "Windows_Research_Kernel (sources) \ WRK-v1.2 \ base \ ntos \ rtl \ amd64 \ stkwalk.c":

  size_t hashValue = 0; for (int i = 0; i < nFrames; i++) hashValue += PtrToUlong(BackTrace[i]); *pBackTraceHash = (DWORD)hashValue;

I'm not sure about the last conversion - some define the last parameter as DWORD, some as ulong64, but this is not relevant. The main problem with this calculation is that it is not unique enough. For the case of recursive function calls - if you have a call order:

 func1 func2 func3

Stack trace for:

 func1 func3 func2

It will be identical.

What I debugged - to detect a memory leak, I get 62876 false hits - the unique calculation of the stack identifier is not reliable enough.

I changed the formula with bits:

 static DWORD crc32_tab[] = { 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d }; if (pBackTraceHash) { size_t hashValue = 0; for( int idxFrame = 0; idxFrame < (int)iFrame; idxFrame++ ) { unsigned char* p = (unsigned char*)&BackTrace[idxFrame]; for( int i = 0; i < sizeof(void*); i++ ) hashValue = crc32_tab[ ((hashValue ^ *p++) & 0xFF) ] ^ (hashValue >> 8); } *pBackTraceHash = (DWORD)hashValue; }

This algorithm does not give false hits, but slows down execution a bit.

Also, the memory leak statistics are different: Invalid algorithm: total leaked memory: 48'874'764 / in 371 distribution pools Algorithm based on Crc32: total leaked memory: 48'874'764 / in 614 distribution pools.

As you can see, statistics combine (pools) a similar call stack together - less fragmentation, but the original call stack is lost. (Incorrect statistics)

Can someone please give me a faster algorithm for this?

0

TarmoPikaro Jan 25 '16 at 20:36

source share

01/27/2016 And the question may be inappropriate - is the definition of a 32-bit call stack. I asked which API to use - at least CaptureStackBackTrace does incomplete crawls (native code only), and also the RtlVirtualUnwind api function does not exist for 32-bit windows.

 From: Noah Falk < noahfalk@microsoft.com > To: Tarmo Pikaro < tapika@yahoo.com >; Mike McLaughlin < mikem@microsoft.com > Cc: Jan Kotas < jkotas@microsoft.com > Sent: Tuesday, January 26, 2016 1:34 AM Subject: RE: Resolving managed call stack from void* Hi Tarmo, hope the exploration of stackwalking has been interesting. If I followed you correctly you've been successful on x64 but hoping you can extend your technique to 32 bit. Indeed the RtlCaptureVirtualUnwind techniques don't work here, and the fundamental reason behind it is that while x64 defines a specific calling convention that all code on Windows is forced to use, x86 does not. This means that there is no algorithm the OS could implement which guarantees correct unwinding when PDBs are unavailable. However you do have some options: 1) You can use simple heuristics that work for certain kinds of code. Unoptimzed code on x86 often uses EBP chaining, in which ESP in the current frame points to EBP, and EBP points to the parent frame's EBP, and so on down the stack. The return address is stored on the stack adjacent to EBP. As I recall all jitted code produced by recent versions of .Net follows these conventions, including optimized jitted code. However when a compiler performs inlining these conventions will be unable to detect it, and optimized code that does not follow this convention could easily cause the stack to become unwalkable. 2) If you are willing to load PDBs you can use the DIA APIs to walk the stack: https://msdn.microsoft.com/en-us/library/dt06fh94.aspx. The PDB contains additional data about optimized code which allows frames that do not follow the EBP chaining convention to be correctly unwound. This is the stack walk API that Visual Studio is using when it debugs 32 bit native code on Windows. 3) The ICorDebug APIs (https://msdn.microsoft.com/en-us/library/dd646502(v=vs.110).aspx) are a set of APIs that are designed to support managed code debuggers. Starting in .Net 4.0 the ICorDebug API supports dump debugging, however the API is designed in such a way that you don't have to serialize a dump file. This is likely to be more complicated than you would want, but its supported to the use the Windows process snapshot APIs to take a snapshot of the memory space and then direct the ICorDebug API to read from this snapshot as if it was a dump. One advantage of the ICorDebug API is that not only will it give you managed stack frames, it also allows exporing all the other kinds of data debuggers would expose such as parameters, local values, fields of objects, types of the values, etc. The MDbg tool (https://www.microsoft.com/en-us/download/details.aspx?id=2282) is a complete sample debugger with source included. It supports dump debugging and displaying callstacks, though it won't have any specific example about using the process snapshot APIs in place of using a dump. The main change would be replacing the implementation of ICorDebugDataTarget. MDbg has an implementation that reads from a dump file and you would need to create a new implementation that reads from a process snapshot using the windows APIs (https://msdn.microsoft.com/en-us/library/dn457825(v=vs.85).aspx). I've never written the code myself and I've heard from other tool authors that they found using the windows snapshot APIs more difficult than expected, but eventually they were successful.

And I was a little inspired by approach 1, as I already saw a similar approach being executed in another project, so I wrote my own implementation for a 32-bit stack traversal:

 int CaptureStackBackTracePro( int FramesToSkip, int nFrames, PVOID* BackTrace, PDWORD pBackTraceHash ) { // // This approach was taken from StackInfoManager.cpp / FillStackInfo // http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks // - slightly simplified the function itself. // int regEBP; __asm mov regEBP, ebp; long *pFrame = (long*) regEBP; // pointer to current function frame void* pNextInstruction; int iFrame = 0; // // Using __try/_catch is faster than using ReadProcessMemory or VirtualProtect. // We return whatever frames we have collected so far after exception was encountered. // __try { for( ; iFrame < nFrames; iFrame++ ) { pNextInstruction = (void*)(*(pFrame + 1)); if( !pNextInstruction ) // Last frame break; BackTrace[iFrame] = pNextInstruction; pFrame = (long*)(*pFrame); } } __except(EXCEPTION_EXECUTE_HANDLER) { } // pBackTraceHash fillout is missing, see in another answer code snipet. return iFrame; } //CaptureStackBackTracePro

Brief tests show that this function is capable of recording its own and managed stack frames.

Optimized code, I think, requires more in-depth analysis. It is better to refuse optimization or just optimize the corresponding parts of the code - for better diagnosis ?!

0

TarmoPikaro Jan 26 '16 at 22:44

source share

Just to myself:

Obviously, CaptureStackBackTrace is likely to directly or indirectly call RtlCaptureStackBackTrace, and the source code for this function is apparently open source at the moment - you can search using the "Windows Research Core".

Code I accidentally discovered by harvesting https://github.com/dotnet/coreclr/blob/master/src/unwinder/amd64/unwinder_amd64.cpp

where there was a link in code borrowed from the Windows kernel:

Everything below is taken from the minkernel \ ntos \ rtl \ amd64 \ exdsptch.c file from Windows

and with googling bit, I discovered the Windows kernel itself more.

Can I update this function to support a managed stack (using information from a process hacker).

[4.1.2015]. Looking deeper, it seems that the main performance bottleneck is not CaptureStackBackTrace itself - because it is a simple iteration, a structure search, but a controlled stack mode, where I call C: \ Windows \ Microsoft.NET \ Framework64 \ v4.0.30319 \ mscordacwks .dll / OutOfProcessFunctionTableCallback - you can find its source code in the .net distribution and, apparently, its memory allocation for analyzing compiled JIT structures. But the problem is that JIT compilation can change every time, and the only way to have a reliable stack trace is to request the same information again and again, which can cause overhead when allocating memory. I assume that the code needs to be changed so that mscordacwks similar code does not allocate memory on its own, but uses run-time structures to define the call stack and function / function records.

PS if you vote for this answer, I would like to know the reason why, which is an alternative. And it’s better if you tried the alternative yourself.

-1

TarmoPikaro Dec 30 '15 at 13:01

source share

Btw - if someone is missing the original StackWalk implementation for windows, it is here:

https://github.com/dotnet/coreclr/blob/master/src/utilcode/stacktrace.cpp

-1

TarmoPikaro Jan 17 '16 at 10:42

source share

TarmoPikaro · Accepted Answer · 2016-01-09T09:41:21+0000

9-1-2015 - I discovered the original function called by the process hacker, and that it was

C: \ Windows \ Microsoft.NET \ Framework64 \ v4.0.30319 \ mscordacwks.dll OutOfProcessFunctionTableCallback

this is the source code - which was here: https://github.com/dotnet/coreclr/blob/master/src/debug/daccess/fntableaccess.cpp

From there I have the owner of most of the changes in this source code - Jan Kotas ( jkotas@microsoft.com ) and contacted him on this issue.

From: Jan Kotas < jkotas@microsoft.com > To: Tarmo Pikaro < tapika@yahoo.com > Sent: Friday, January 8, 2016 3:27 PM Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode... ... The mscordacwks.dll is called mscordaccore.dll in CoreCLR / github repro. The VS project files are auto-generated for it during the build (\coreclr\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\mscordaccore.vcxproj). You should be able to build and debug CoreCLR to understand how it works. ... From: Jan Kotas < jkotas@microsoft.com > To: Tarmo Pikaro < tapika@yahoo.com > Sent: Saturday, January 9, 2016 2:02 AM Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode... > I've tried to replace > C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll dll loading > with C:\Prototyping\dotNet\coreclr-master\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\Debug\mscordaccore.dll > loading (just compiled), but if previously I could get mixed mode stack trace correctly: > ... mscordacwks.dll is tightly coupled with the runtime. You cannot mix and match them between runtimes. What I meant is that you can use CoreCLR to understand how this works.

But then he recommended this solution, which worked for me:

 int CaptureStackBackTrace3(int FramesToSkip, int nFrames, PVOID* BackTrace, PDWORD pBackTraceHash) { CONTEXT ContextRecord; RtlCaptureContext(&ContextRecord); UINT iFrame; for (iFrame = 0; iFrame < nFrames; iFrame++) { DWORD64 ImageBase; PRUNTIME_FUNCTION pFunctionEntry = RtlLookupFunctionEntry(ContextRecord.Rip, &ImageBase, NULL); if (pFunctionEntry == NULL) break; PVOID HandlerData; DWORD64 EstablisherFrame; RtlVirtualUnwind(UNW_FLAG_NHANDLER, ImageBase, ContextRecord.Rip, pFunctionEntry, &ContextRecord, &HandlerData, &EstablisherFrame, NULL); BackTrace[iFrame] = (PVOID)ContextRecord.Rip; } return iFrame; }

There is still no backtrace hash calculation in this snipet software, but after that it can be added.

It is also very important to note that when debugging this snipet code, you should use your own debugging rather than mixed mode (the C # project uses mixed mode by default), since it somehow violates the stack trace in the debugger. (Something to figure out how and why such distortions occur)

There is another missing piece of the puzzle - how to make the character resolution fully resistant to the FreeLibrary / Jit code, but this is what I need to find out more.

Please note that RtlVirtualUnwind most likely only works on 64-bit architecture, and not on hand or 32-bit.

Another funny thing: there is an RtlCaptureStackBackTrace function that somehow resembles the windows api CaptureStackBackTrace function, but they are somehow different - at least by naming. Also, if you check RtlCaptureStackBackTrace - it eventually calls RtlVirtualUnwind - you can check it from the sources of the Windows Research kernel

 RtlCaptureStackBackTrace > RtlWalkFrameChain > RtlpWalkFrameChain > RtlVirtualUnwind

But what I tested, RtlCaptureStackBackTrace is not working properly. Unlike the RtlVirtualUnwind function above.

This is a kind of magic. :-)

I will continue this questionnaire with a phase 2 question - here:

Allow managed and native stack tracing - which API to use?

Fast stack stack trace in Windows / 64-bit / mixed mode

More articles: