Case of the damaged stack pointer

Recently I was asked to investigate quite an interesting case. I had a dump file that Windows created, I started WinDbg and opened that dump file and I saw this:

(7b9c.ab4): Access violation - code c0000005 (first/second chance not available)
For analysis of this file, run !analyze -v
MyProgram!MyModule.SomeClick+0x71:
00000014`00ace011 5b              pop     rbx

Usually, an Access violation happened when there is an attempt to read or write from memory.  But in this, quite a rare case application crashes on pop instruction. But as everybody knows pop and push instructions do read from and write to memory because the stack is located in the same memory as everything else. So my next action was to check rsp register:

0:000> ? @rsp

And the response was:

Evaluate expression: 6422689 = 00000000`006200a1

And I immediately realized that rsp register points to the wrong location. Stack pointer should be always 16- byte aligned as stated here:

“The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed), and except where indicated in Function Types for a certain class of frame functions.”

Technically the statement above is not 100% correct because stack pointers can be misaligned during epilog too. But in normal execution stack pointer should never have 1 as the last digit. Typically the last digit can be 0 most of the time or 8 during prolog and epilog. And for nitpickers, this function wasn’t a special function type too.

As some of you probably guessed, my attempt to see the call stack displayed stack with a single function SomeClick that I see at the beginning. I also check rbp register hoping that maybe it can contain a valid stack pointer:

0:000> ? @rbp

but it also looked bad:

Evaluate expression: 6422633 = 00000000`00620069

So it looks like something is damaged stack and as a result corrupted return address (As I thought). I have the source code of SomeClick and I can find out what calls SomeClick and what  SomeClick can call. But I need to be able to investigate what happened to the stack but both registers that are dealing with stack are useless now.

But each thread needs to know where the stack starts and its limit. It's written in Thread Environment Block. To see it I executed the following command:

0:000> !teb

And the result was:

TEB at 00000000003f2000
    ExceptionList:        0000000000000000
    StackBase:            0000000000800000
    StackLimit:           00000000007cb000

So I know that stack starts at 0x800000 and finishes at 0x7cb000. Great. Then I opened the Memory window, entered address 0x800000, and switch “Display format” to “Pointer and Symbol”. This will resolve any return addresses in the stack. It looks like this:

00000000`007fd248 00007ffce5310919 clr!MethodTable::GetComCallWrapperTemplate+0x21
00000000`007fd250 000000010c5ffc98
00000000`007fd258 00000001a6c85290
00000000`007fd260 0000000000000073
00000000`007fd268 0000000000000000
00000000`007fd270 0000000000000000
00000000`007fd278 00007ffce565b712 clr!InternalDispatchImpl_Invoke+0x2fe

My idea was to find the function that calls SomeClick and the function that SomeClick calls and investigate the stack between them. The idea was that perhaps I can get some ideas from the nature of corruption and that will help in my investigation.

Luckily for me, that stack wasn’t too big and I knew that SomeClick is usually very low on the stack. And to my surprise, I found the return address of SomeClick  itself, and I found the address of the function that SomeClick calls:

00000000`007fe0c0 002e003400380065
00000000`007fe0c8 0034003500340036
00000000`007fe0d0 0000000000620069
00000000`007fe0d8 0000001401b327ab MyProgram!MyModule.SomeClickChild
00000000`007fe0e0 0000000000000202
00000000`007fe0e8 0000000000000000
00000000`007fe0f0 0000000080007811
…
00000000`007fe170 0000000193be1640
00000000`007fe178 00000000007fe150
00000000`007fe180 00000000007fe660
00000000`007fe148 0000001400acdff0 MyProgram!MyModule.SomeClick

And because I can see the return address it means that the return address wasn’t corrupted as I thought initially and it is still present on the stack. But I checked the stack right above the return address of MyProgram!MyModule.SomeClickChild function.

The value on the stack just above of SomeClickChild function is 0000000000620069:

00000000`007fe0c8 0034003500340036
00000000`007fe0d0 0000000000620069
00000000`007fe0d8 0000001401b327ab MyProgram!MyModule.SomeClickChild

This number is very close to the value of the register rsp. Remember at the beginning of my investigation the value of the rsp register was 006200a1. Then I checked the value of the rbp register and it is exactly the same as that value.

And then I got the whole picture. The last 2 instructions of the function  that SomeClickChild  called look like this:

00000014`01b32203 5d              pop     rbp
00000014`01b32204 c3              ret

And because of that stack, you saw previously, value 0000000000620069 was written to register rbp during execution pop rbp instruction and then execution returns to the SomeClickChild function. SomeClickChild does not use rbp register and it didn’t change its value and then execution returns to SomeClick. There is no other code after the call to the SomeClickChild function, and SomeClick will return. And it has the following instructions at the end:

00000014`00ace00d 488d6538        lea     rsp,[rbp+38h]
00000014`00ace011 5b              pop     rbx

That code effectively writes rbp+38h to rsp and if you add 0x38 to 0000000000620069  you will get our original stack pointer value. Then I examined a code in the function that SomeClickChild called and found that it had really old code that writes to an array of char located on the stack without verifying the size of the data. A typical mistake of the 80s and 90s when that code was written and to be fair it wasn’t really obvious.

I have shown how it is possible to recover from stack pointer corruption using Thread Environment Block and how to get more information and in some cases find out why the crash happened in the first place.

I hope it helps someone.