Case of strange COM error

Recently I was asked to investigate a strange crash that happens in our application. The application calls function Test that returns HRESULT. This function is part of the interface. Let’s call it IMyIntf. The caller is located in the main.dll and the interface is implemented in the module.dll.

And this call failed. Initially, I started checking the function implementation of the Test function to see what can be possibly wrong there but after some time I found that it returns a rather strange HRESULT. In this call stack HRESULT was 0xd337fe00. I cannot find any description for such a strange HRESULT. Then I tried to find similar call stacks and I found two more with HRESULT 0x92c7da27 and 0xec4ff970.

I check the source code for main.dll and it has the following definition for this interface:

struct  TestStruct
{
    int i1;
    int i2;
    int i3;
    int i4;
};

interface UUID_SPECIFIER("{747F931D-6111-4A49-88D0-FE782CF075EE}") IMyIntf : public IUnknown
{
    virtual STDMETHODIMP Test(TestStruct & result) = 0;
};

Then I checked how this function is declared in module.dll and found the following definition:

struct  TestStruct
{
    int i1;
    int i2;
    int i3;
    int i4;
};

interface UUID_SPECIFIER("{747F931D-6111-4A49-88D0-FE782CF075EE}") IMyIntf : public IUnknown
{
    virtual TestStruct STDMETHODCALLTYPE Test() = 0;
};

As a side note: You probably spotted differences in these definitions and you may ask why there are 2 different definitions. The actual issue happens in code written in a different language and it is not possible to share a single definition. Moreover, in that language definition of that interface in the  module.dll is almost exactly the same definition as in module.dll except that it is missing one modifier at the end of the function. As result, it is much harder to spot.

This is an obvious bug. Then I checked when this modification was made and it looks like it was done more than 4 years ago. We did multiple releases of our application during these 4 years and there are only a few crash reports. At that moment I had 2 questions: first –how does it work? And second – why does it crash only sometimes? So I started an investigation.

I found an answer to my first question relatively fast. The caller calls Test function and then checks HRESULT. In x64 caller returns the result in register rax.  C++ version uses rax register when transferring address of result parameter to register rdx. And all callers always declare TestStruct on the stack, so effectively on return from Test, register rax will always point to stack.

As a side note: The original problem is slightly different because that language always returns the address of TestStruct in register rax but you get the idea.

When I tried to execute that block of code and it always succeeded. Moreover, the stack always was located at relatively low addresses. As result data in the rax register always looks like 0x00XXXXXX. As you know, failed HRESULT must have the highest bit set. I ran the application quite a lot of times and I never had anything close to 0x8XXXXXXX. This is the answer to my first question, the stack is located at low addresses, and as result, the highest of eax register is never set, and SUCCEEDED function always returns true.

As a side note: Our application compiled without /DYNAMICBASE switch because that language does not support this feature. As a result, the stack location is still slightly changing but only slightly.

Now I knew why it never crashes for me, but why does it crash on our customers? Variation in the location of the stack gave me an idea and I started looking. This addresses randomization is called “Address space layout randomization” or ASLR for short and it was first enabled in Windows 7. I started checking if is it possible to randomize the stack more aggressively. After some searching I found these steps:

  • Open Windows Exploit Protection
  • Go to “Program Settings” and click “Add program to customize” and then select “Choose exact file path” and select the application you want to test
  • Find “Force Randomization for images (Mandatory ASLR). Click on override system settings and then switch to On position
  • Do the same for “Randomize memory allocations (Bottom-up ASLR)”

After these steps, the stack is located pretty much everywhere, and after a few runs application crashed because highest bit of eax register is set.

It looks like IT in some organizations is quite serious about security and cranked up Exploit Protection and as result, it revealed this problem.

I hope it will help someone.