Under heavy load, our application turns a muscular server into 100% CPU utilization. Reading the dump process, looking at the threads, some of them for 10 minutes. None of them give me any understanding when using! CLRStack
The runaway gives me:
0:030> !runaway User Mode Time Thread Time 53:2e804 0 days 0:10:04.703 30:31894 0 days 0:07:51.593 33:47100 0 days 0:07:24.890 42:11e54 0 days 0:06:45.875 35:35e18 0 days 0:06:07.578 41:54464 0 days 0:05:49.796 47:57700 0 days 0:05:45.000 44:3c2d4 0 days 0:05:44.265 32:3898c 0 days 0:05:43.593 50:54894 0 days 0:05:41.968 51:5bc58 0 days 0:05:40.921 43:14af4 0 days 0:05:40.734 48:35074 0 days 0:05:40.406 ...
Call! DumpStack in one of these threads, I get:
0000001ab442f900 00007ff9ef4c1148 KERNELBASE!WaitForSingleObjectEx+0x94, calling ntdll!NtWaitForSingleObject 0000001ab442f980 00007ff9e920beb2 clr!SVR::gc_heap::compute_new_dynamic_data+0x17b, calling clr!SVR::gc_heap::desired_new_allocation 0000001ab442f9a0 00007ff9e90591eb clr!CLREventWaitHelper2+0x38, calling kernel32!WaitForSingleObjectEx 0000001ab442f9b0 00007ff9e90e0d2c clr!WriteBarrierManager::UpdateEphemeralBounds+0x1c, calling clr!WriteBarrierManager::NeedDifferentWriteBarrier 0000001ab442f9e0 00007ff9e9059197 clr!CLREventWaitHelper+0x1f, calling clr!CLREventWaitHelper2 0000001ab442fa40 00007ff9e9059120 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventWaitHelper 0000001ab442fa70 00007ff9ef4c149c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent 0000001ab442faa0 00007ff9e90ef1e1 clr!SVR::gc_heap::set_gc_done+0x22, calling clr!CLREventBase::Set 0000001ab442fad0 00007ff9e90e9331 clr!SVR::gc_heap::gc_thread_function+0x8a, calling clr!CLREventBase::WaitEx 0000001ab442fb00 00007ff9e92048e7 clr!SVR::gc_heap::gc_thread_stub+0x7a, calling clr!SVR::gc_heap::gc_thread_function 0000001ab442fb60 00007ff9e91a0318 clr!Thread::CLRSetThreadStackGuarantee+0x48, calling kernel32!SetThreadStackGuaranteeStub 0000001ab442fb90 00007ff9e91a01ef clr!Thread::CommitThreadStack+0x10, calling clr!Thread::CLRSetThreadStackGuarantee 0000001ab442fbd0 00007ff9e910df0b clr!ClrFlsSetValue+0x57, calling kernel32!SetLastErrorStub 0000001ab442fc00 00007ff9e92048dc clr!SVR::gc_heap::gc_thread_stub+0x6f, calling clr!_chkstk 0000001ab442fc40 00007ff9f0d316ad kernel32!BaseThreadInitThunk+0xd 0000001ab442fc70 00007ff9f1e54409 ntdll!RtlUserThreadStart+0x1d
What does this tell me? I see a lot of calls in the CLR, but I cannot figure out where the problem will be. After .reload (suggested by Thomas), I now see GC calls.
Update 1
After running xperf, each w3wp.exe consumes about 45% of the processor. Filtering by one of them and grouping by function, there is a function designated as "?" which is 13.62%, the remaining 2.67% or less. How do I know what it is ???
Update 2
Ran xperf again, and the JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel function JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel responsible for 12.31% of the CPU usage. "?" the function is still there.
Stacking:
Line
As you can see, these two users are responsible for more than 27% of CPU usage (for each process, therefore it is significant).
Update 3
After using wpr.exe (suggestion from @ magicandre1981):
wpr.exe -start cpu and wpr -stop result.etl
I found out that FormsAuthentication and some unnecessary Ninject calls on a critical path contributed about 16% of CPU usage. I still don't understand threads running gor 10 minutes or more.
Update 4
Tried DebugDiag (suggestion from @leppie) and it just confirmed that the hanging threads are like:
Thread ID: 53 Total CPU Time: 00:09:11.406 Entry Point for Thread: clr!Thread::intermediateThreadProc Thread ID: 35 Total CPU Time: 00:07:26.046 Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub Thread ID: 50 Total CPU Time: 00:07:01.515 Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub Thread ID: 29 Total CPU Time: 00:06:02.264 Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub Thread ID: 31 Total CPU Time: 00:06:41.281 Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub
or due to StackExchange.Redis:
DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)+e1 [[InlinedCallFrame] (StackExchange.Redis.SocketManager.select)] StackExchange.Redis.SocketManager.select(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValueByRef) StackExchange.Redis.SocketManager.ReadImpl()+889 StackExchange.Redis.SocketManager.Read()+66
or
[[GCFrame]] [[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) mscorlib_ni!System.Threading.Monitor.Wait(System.Object, Int32)+19 StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+24f StackExchange.Redis.RedisBase.ExecuteSync[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+77 [[StubHelperFrame]] StackExchange.Redis.RedisDatabase.SetMembers(StackExchange.Redis.RedisKey, StackExchange.Redis.CommandFlags)+ee