How to debug COM deadlocks in .NET with WinDbg

Recently I was investigating deadlock related to COM in .NET. In we have some engine that executes asynchronous requests. It is written in native code and use COM like interfaces. Nothing like true COM where you have some registration etc. We just use COM like interfaces for interop between .NET and native code. And I have following code:

00 00000001`4eeed7d8 00007ff9`184b8027 ntdll!NtWaitForMultipleObjects+0x14
01 00000001`4eeed7e0 00007ff9`1a583905 KERNELBASE!WaitForMultipleObjectsEx+0x107
02 00000001`4eeedae0 00007ff9`1a583665 combase!MTAThreadWaitForCall+0x115
03 00000001`4eeedbb0 00007ff9`1a5329c7 combase!MTAThreadDispatchCrossApartmentCall+0xc5
04 (Inline Function) --------`-------- combase!CSyncClientCall::SwitchAptAndDispatchCall+0x325
05 00000001`4eeedc00 00007ff9`1a581e94 combase!CSyncClientCall::SendReceive2+0x407
06 (Inline Function) --------`-------- combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x25
07 (Inline Function) --------`-------- combase!CSyncClientCall::SendReceiveInRetryContext+0x25
08 00000001`4eeede00 00007ff9`1a530900 combase!DefaultSendReceive+0x64
09 00000001`4eeede60 00007ff9`1a581bc4 combase!CSyncClientCall::SendReceive+0x330
0a 00000001`4eeee090 00007ff9`1a5a2e4e combase!CClientChannel::SendReceive+0x84
0b 00000001`4eeee100 00007ff9`1b1d8b95 combase!NdrExtpProxySendReceive+0x4e
0c 00000001`4eeee130 00007ff9`1a5a0cbb rpcrt4!NdrpClientCall3+0x395
0d 00000001`4eeee490 00007ff9`1a61c5f2 combase!ObjectStublessClient+0x13b
0e 00000001`4eeee820 00007ff9`1a5268a9 combase!ObjectStubless+0x42
0f 00000001`4eeee870 00007ff9`1a59cd6f combase!CObjectContext::InternalContextCallback+0x259
10 00000001`4eeee990 00007ff8`c79fed60 combase!CObjectContext::ContextCallback+0x7f
11 00000001`4eeeea30 00007ff8`c79ffb42 clr!CtxEntry::EnterContext+0x295
12 00000001`4eeeec10 00007ff8`c789de97 clr!IUnkEntry::UnmarshalIUnknownForCurrContext+0xbd
13 00000001`4eeeecc0 00007ff8`c789dde2 clr!IUnkEntry::GetIUnknownForCurrContext+0x20cd8f
14 00000001`4eeeecf0 00007ff8`c76924c1 clr!RCW::SafeQueryInterfaceRemoteAware+0x20bab2
15 00000001`4eeeed50 00007ff8`c75f1f79 clr!RCW::CallQueryInterface+0x8d
16 00000001`4eeeedc0 00007ff8`c75f226f clr!RCW::GetComIPForMethodTableFromCache+0xb5
17 00000001`4eeeee80 00007ff8`c75f2716 clr!ComObject::SupportsInterface+0xfe
18 00000001`4eeeeff0 00007ff8`c75f2642 clr!Object::SupportsInterface+0x9e
19 00000001`4eeef060 00007ff8`c75f255a clr!UnmarshalObjectFromInterface+0x7a
1a 00000001`4eeef0a0 00007ff8`69a6a493 clr!StubHelpers::InterfaceMarshaler__ConvertToManaged+0xca
1b 00000001`4eeef240 00007ff8`c74b2e89 0x00007ff8`69a6a493
1c 00000001`4eeef290 00007ff8`c75f13b3 clr!COMToCLRDispatchHelper+0x39
1d 00000001`4eeef2c0 00007ff8`c74b2de7 clr!COMToCLRWorker+0x1b4
1e 00000001`4eeef380 00000001`3aeb49fb clr!GenericComCallStub+0x57
1f 00000001`4eeef410 00000001`3aeb4721 AsyncEngine.AsyncResult.PerformCallback+0x4b

Basically, to execute request code create AsyncResult class that also supports IAsyncResult. Also, code can pass IAsyncCallback to AsyncResult. IAsyncCallback has one function called Execute that has one parameter of IAsyncResult to notify which requests competed. Basically, in in C++ it looks something like this:

IAsyncCallback* m_callback;
…
void AsyncResult ::PerformCallback()
{
    m_callback->Execute(this);
}

And in my case IAsyncCallback is implemented on .NET side.

From deadlock callstack we can see that native code calls CCW. You can read about CCW here: https://docs.microsoft.com/en-us/dotnet/standard/native-interop/com-callable-wrapper. But later execution switches to RCW https://docs.microsoft.com/en-us/dotnet/standard/native-interop/runtime-callable-wrapper that attempts to marshal execution to different thread because of thread affinity. Ok it is nice, but there a lot of requests going thru this engine and I need to find .NET object that passed as callback and what RCW .NET is marshaling.

At first glance it looks quite hard as you have no idea where that data is stored. But it is actually quite simple. You have to type kb command in windbg and you will something like that:

16 00007ff8`c75f226f : 00000000`00000000 00000000`00150000 00000001`79ca9750 00000003`00000104 : clr!RCW::GetComIPForMethodTableFromCache+0xb5
...
1e 00000001`3aeb49fb : 00000001`12e0d520 00000001`75c99b90 00000000`00000010 00000001`4bbf2e00 : clr!GenericComCallStub+0x57
1f 00000001`3aeb4721 : 00000001`75c99b20 00000001`3a61b901 00000001`4eeeff08 00000001`75c99b90 : EDMSInterface!ASyncResults.TEDM_AsyncResult.PerformCallback+0x4b

I copied only 2 relevant lines. Then you need to execute following command to run dll that helps you debug .NET application into WinDbg:

.cordll -ve -u -l

And then execute command that will show information about that CCW and pass first argument to clr!GenericComCallStub:

!DumpCCW 00000001`12e0d520

and you will see something like this:

CCW:               0000000112e0d500
Managed object:    00000001627b2118
Outer IUnknown:    0000000000000000
Ref count:         1
Flags:             
RefCounted Handle: 0000000112de2518 (STRONG)
COM interface pointers:
              IP               MT Type
0000000112e0d520 00007ff869dd4640 AsyncLib.IAsyncCallback

Usually “Managed object” displayed as hyper link so you can click on it or manually type following command:

!DumpObj /d 00000001627b2118

And you will see something like this:

Name:        SomeAssembly.Classes.Utils.AsyncCallbackDxp
MethodTable: 00007ff869dd5088
EEClass:     00007ff869dbe6d0
CCW:         0000000112e0d500
Size:        24(0x18) bytes
File:        C:\Program Files\BlaBla\SomeAssembly.Classes.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8c5f4a680  4000076        8 ...bject, mscorlib]]  0 instance 00000001627b20d8 setResult

This already will give you a lot of information where does it happens and help find code that executes that request in .NET. But what about RCW of parameter? For this you need to use 3rd parameter that displayed in function GetComIPForMethodTableFromCache:

!DumpRCW 00000001`79ca9750

And you will see something like that:

Managed object:             00000001627b2130
Creating thread:            0000000111368900
IUnknown pointer:           0000000175c99b30
COM Context:                000000000091e9d8
Managed ref count:          2
IUnknown V-table pointer :  000000013a4d2f88 (captured at RCW creation time)
Flags:                      
COM interface pointers:
              IP          Context               MT Type
0000000175c99ba0 000000000091e9d8 00007ff869dd43a8 SomeLib.IAsyncSimpleResult

So this will give you both types and hopefully it help you to find source of deadlock. In my case that IAsyncSimpleResult was already passed to .NET in main thread and result it inherits apartment from that thread and this interface became STA thread. And as result all operations on this interface are transferred for execution in main thread but it was waiting on result of that async operation and as result there was deadlock. And knowing what CCW and what RCW greatly speedup investigation.

I hope it helps someone.