Patching binary code. Part 2
In the previous part, I explained certain issues in Example 2 for all types of cases: multithreading issues, and issues with recursive calls. Also must state that there is also a cost involved in calling the WriteProcessMemory function twice each call to the XYZ function. If the XYZ function is called often then we will have a performance hit. But is there anything we can improve? And it turns out that we can for Case 1.
Example 3 for Case 1
The main idea for this case is to call the WriteProcessMemory function only once. And when we need to call the original function we need to simulate instructions that were overwritten with jump instruction and then jump to the original function with some offset.
And because it is Case 1 and we control when this function will change we can do it. Let's check it in an example.
Imagine we have a function XYZ with 2 int parameters that will also return int. Let's check the disassembly and binary code of this function:
423C80 55 push rbp
423C81 4883EC30 sub rsp,$30
423C85 488BEC mov rbp,rsp
423C88 894D40 mov [rbp+$40],ecx
423C8B 895548 mov [rbp+$48],edx
423C8E 488B0DAB860000 mov rcx,[rel $000086ab]
The instruction length for the 32-bit relative jump is 5 bytes long for x86 and x64 architectures: 1 byte for instruction code and then 4 bytes for relative address. As you can see combined length of the first 2 instructions is also 5 bytes.
So our jump instruction will destroy only the first 2 instructions. As a result, when we call the original version we will need to execute the first 2 instructions somewhere else and then jump to address 423C85.
So after we write our jump instruction and code will look like this:
0000000000423C80 E98B000000 jmp InterceptedXYZ
0000000000423C85 488BEC mov rbp,rsp
0000000000423C88 894D40 mov [rbp+$40],ecx
0000000000423C8B 895548 mov [rbp+$48],edx
Note: if the jump instruction would take 6 bytes, then writing the jump instruction will destroy 3 instructions and we will need to jump to address 423C88 because the instruction at address 423C85 would be partially destroyed.
Then we have to save the location where we need to jump into the XYZAddr variable. In our case, it will be address XYZ plus 5 bytes. 5 means an offset of the first instruction that is not affected by writing our jump instruction over. If jump instruction would be 6 bytes long then it will be 8 bytes instead of 5.
Then we need to create a new and pure assembly function called CallXYZ and it should contain instructions that we destroyed and then jump to XYZAddr:
423D00 55 push rbp
423D01 4883EC30 sub rsp,$30
423D05 48FF2504270100 jmp qword ptr [rel XYZAddr]
And InterceptedXYZ function can contain anything you want and when you need to call the original XYZ function you call the CallXYZ function.
This approach is much better because we need to call the WriteProcessMemory only once and as a result, the code will be much faster. Also, we can catch recursive calls and it works fine with multiple threads. But I repeat that this approach works only for cases when you control when the XYZ function will change.
How to calculate jump address
As far as I know, there is no instruction that jumps to an absolute address that is encoded in the instruction itself. This is mostly due to the reason that this instruction will not be very useful because if an image is loaded into a different address then the absolute address becomes incorrect and needs to be fixed and it creates unnecessary work. Relative jumps are location-agnostic. Also, relative jumps can be very short if you need to jump to a function that is located nearby.
Relative jumps are the same for 32 and 64-bits. This instruction is encoded with 0xE9 and then 4 bytes an address that is relative to the address of the next instruction after the jump instruction.
But how to calculate the address? Very simple. In the case above we take the address of the InterceptedXYZ function minus the address of XYZ instruction and minus 5 which is the length of the jump instruction.
This works fine for 32-bit jumps, but what if you need a 64-bit jump? Unfortunately, there is no 64-bit relative jump instruction. It is possible to push an absolute 64-bit address on the stack and execute the RET instruction. Or write this address into the register and jump at that address. Lastly, it is possible to jump to the address that is stored at some memory address and that is accessible using a relative 32-bit address. I decided to go this way and write the instruction jmp qword ptr [rel $00000000]and 64-bit absolute address after that.
The jmp qword ptr [rel $00000000]instruction has the following bytes:
0xFF, 0x25 – instruction code
00, 00, 00, 00 – address which is relative to the end of the instruction. Putting zero means that the address will immediately follow the jump instruction.
And then we write 64-bit absolute address. In total it takes 14 bytes: the instruction itself takes 6 bytes (2 for code and 4 for the relative address of the variable) and the absolute address takes another 8 bytes. In the calling convention used for Microsoft Windows, a function is always aligned to 16 bytes, so it should be enough to write 14 bytes.
In the next part, we will check more exotic ways to intercept calls.