Deadlock in GC

  •   Posted in:
  • .NET

I was investigating some dead lock that happens in our application. Here is call stack of main thread:

ntdll!NtWaitForSingleObject+0x14
KERNELBASE!WaitForSingleObjectEx+0x93
clr!CLREventWaitHelper2+0x3c
clr!CLREventWaitHelper+0x1f
clr!CLREventBase::WaitEx+0x71
clr!WKS::GCHeap::WaitUntilGCComplete+0x2e
clr!Thread::RareDisablePreemptiveGC+0x18f
clr!StubRareDisableHRWorker+0x38
clr!COMToCLRWorker+0x19d612
clr!GenericComCallStub+0x57
SomeDll!SomeFunction+0x62
ntdll!LdrpCallInitRoutine+0x6f
ntdll!LdrpInitializeNode+0x1c1
ntdll!LdrpInitializeGraphRecurse+0x80
ntdll!LdrpPrepareModuleForExecution+0xc5
ntdll!LdrpLoadDllInternal+0x199
ntdll!LdrpLoadDll+0xa8
ntdll!LdrLoadDll+0xe4
hmpalert!CVCCP+0x67eb
KERNELBASE!LoadLibraryExW+0x161
KERNELBASE!LoadLibraryExA+0x31
KERNELBASE!LoadLibraryA+0x3f
0x00007ffa`0fec9ece
...
clr!ExecuteEXE+0x3f
clr!_CorExeMainInternal+0xb2
clr!CorExeMain+0x14
mscoreei!CorExeMain+0x112
mscoree!CorExeMain_Exported+0x6c
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

Ok, it looks like main thread is waiting for GC to finish. But what is GC doing? And it looks GC related code is here:

ntdll!NtWaitForSingleObject+0x14
KERNELBASE!WaitForSingleObjectEx+0x93
clr!CLREventWaitHelper2+0x3c
clr!CLREventWaitHelper+0x1f
clr!CLREventBase::WaitEx+0x71
clr!`anonymous namespace'::CreateSuspendableThread+0x10c
clr!GCToEEInterface::CreateThread+0x170
clr!WKS::gc_heap::prepare_bgc_thread+0x4c
clr!WKS::gc_heap::garbage_collect+0x1836b7
clr!WKS::GCHeap::GarbageCollectGeneration+0xef
clr!WKS::GCHeap::Alloc+0x29c
clr!JIT_New+0x339
...
mscorlib_ni!System.Threading.Tasks.Task.Execute()$##6003FAD+0x47
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AEF+0x172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AEE+0x15
mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)$##6003FBA+0x231
mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry(Boolean)$##6003FB9+0xa1
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AEF+0x172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003AEE+0x15
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)$##6003AED+0x55
mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)$##6003BFF+0x60
clr!Thread::intermediateThreadProc+0x8b
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

All other threads are waiting for GC or they are system threads and doing nothing suspicious.
Well, at first glance nothing looks wrong here. Yet there is dead lock. And source of problem is in LoadLibrary. Let me explain this part in detail.

LoadLibrary internally will call DllMain of newly loaded dll with DLL_PROCESS_ATTACH as reason. And until DllMain returns all other notifications will have to wait.

Creating new thread will call every DllMain for every dll with DLL_THREAD_DETACH reason, unless dll opts out. But most dlls do not do it. As result, if you check second call stack, you will see that there is CreateSuspendableThread call and then some wait. It waits for thread to start and signal some event. But thread cannot start until it notifies all dlls that it started. And notifications are postponed until DllMain that was called from LoadLibrary returns. And as result we are having classical deadlock.

And here is important lesson that everybody should know. DllMain should do as less as possible to avoid such situation. I hope it helps someone.