Patching binary code. Part 1
Why
There are times when you want to change the behavior of a function that you cannot recompile for some reason. For example, there could be a bug in a runtime function of your language. Changing the version of your language is typically quite a challenge and it could be much better to patch that function instead of making drastic changes like switching to a different version.
Or as another example, there could be some 3rd party library that has a bug and the author didn’t provide a source code. Or perhaps you want to do some pre or post-processing. For example, you may want to save some input or output parameters or even modify them.
For further discussion let's assume that the function we want to modify is named XYZ and we need to modify the binary code of this function.
Case 1
Case 1 is for the case when you control when the function you want to patch changes. For example, you have an application that uses a certain C++ runtime and you or your team decides when to switch to a different runtime version. So you control an application and runtime version and there will be no surprises. In this case, you can use Case 1.
It is a very good idea to add code that verifies that the bytes you are about to change are the bytes your code expects. Imagine that XYZ is a function from C++ runtime and somebody will switch to a different version of that runtime and your code is not aware of your change. The XYZ function can be different in the new runtime and most likely you will just corrupt it and it will crash. The worst case if it will only crash sometimes.
Case 2
This is the case when you don’t know when the function you want to change changes. For example, if you are writing a DLL that people can use in their. In this case, you don’t know when the XYZ function will change. This also applies to a case when the XYZ function is a Windows API function as well (more about this later).
How
Normally you cannot write into a code segment because it has only read and execute rights. As a result, you will have to use the specialized function WriteProcessMemory for this. Obviously, you must know assembly language and you need tools to write assembly code.
It is an extremely good idea to leave some comments that explain what your code does. Set of bytes you want to change and disassembly of these bytes. Then new bytes and their disassembly. It will save you a lot of time when you will need to repeat the process a few years later when you want to update to a new version of that code.
Examples
Example 1 for Case 1
Imagine that the fix is very simple and does not require complex changes. This is a case when patching the code of that function can be beneficial. Effectively we can just overwrite a few bytes of that function.
For example, convert conditional jump to unconditional. These instructions have the same length and it will not affect anything. So all you need is to change a single byte in that instruction. Another variant of this case is changing data for that instruction. For example, you need to change 1 to 2 or 0.
Another simple case is to disable conditional jump. In this case, you replace the whole instruction with NOP instructions. NOP is a single-byte instruction and needs to be placed as many times as bytes in the original instruction.
Little bit harder case when you need to add more instructions but there is no space for them. Sometimes you can make space by shortening some instructions. For example, you may see the following code:
add rax, 1
and you can replace it to
inc rax
The INC is slower but one byte shorter. Sometimes there is a loop in the code and the compiler unrolls the loop. You can roll it back. It will be shorter and slower but perhaps it is ok. Sometimes compiler inlines some code that copies data. In this case, you can try to replace it with REP MOVS instruction.
Sometimes there are NOP bytes between the end of the XYZ function and the beginning of the next function. Especially in x64 because it has very strict aligning requirements. Be creative!
Also, remember to add NOP instruction if you modify a single instruction. For example, imagine you have 2 instructions: one is 3 bytes and 1 byte long. Then you want to modify the first one, but the new instruction is just 2 bytes. In this case, you need to add NOP after the second byte, so the second instruction will be interpreted correctly.
Ok, that was an easy case, but most of the time, you simply cannot change the XYZ function to accommodate your changes because they are too big or too complex.
Example 2 for any case
Calculate how many bytes jump instruction will take on your platform. Keep in mind that for the x64 platform, you will need to take into account that for a big application there is a possibility that will need to have a 64-bit absolute jump and not a 32-bit relative one. And keep in mind that the jump to absolute address in x64 is quite a long instruction.
Assuming that the jump instruction will take X bytes, we then save the first X bytes of binary code of the XYZ function and then overwrite them with code that jumps to your function XYZ_Hook.
XYZ_Hook function must have exactly the same parameters as the XYZ function. If there are any hidden parameters like this you will need to add them in appropriate places or place them in fake class too, but keep in mind that in this case it will have different this.
Also, remember that the XYZ_Hook must not have its own hidden parameters. For example, the XYZ_Hook function is a member of the class you need to make sure that it is static and will not have hidden this parameter (except for the case above).
After that, any call of the XYZ function from any place will transfer control to the XYZ_Hook function. XYZ_Hook will do the necessary pre-processing and then restore overwritten bytes of the XYZ function to restore code to its original state and call the XYZ function. Because now the XYZ function contains the original code, it will execute normally and then return back to the XYZ_Hook function.
After that, XYZ_Hook can do post-processing and then overwrite the first bytes again to restore your hook and return. And because execution arrived into XYZ_Hook function via jump instruction execute will return from XYZ_Hook to the original caller who called XYZ function.
And because the XYZ_Hook function has exactly the same parameters as the XYZ function compiler will generate the correct code to save and restore all registers and everything will work correctly.
While this approach works fine in most cases, there are certain things to watch out for.
Firstly, you must know if any other threads will call this function. If this is possible, then you need to use a critical section because it is quite bad when one thread changes the code of the function that is executing by a different thread. Or when one thread is writing a jump instruction when another thread is restoring original bytes.
Secondly, if XYZ is recursive then you will not be able to catch a recursive call in this case because XYZ_Hook will restore the original bytes. But for Case 1 there is hope and it is possible to fix in some cases.
Thirdly, if this is a Windows API function, Windows can hot patch them. I’m not sure how it works in case bytes are different from what Windows is expecting, but you need to keep this thing in mind if you write service or something critical.
Lastly, you need to have enough bytes in function code to be able to write jump instructions. In some cases, especially in 32-bit code, functions can be packed very tightly and the XYZ function can be very short. For example, it can be a single-byte RET instruction. As a result, you may overwrite the next function after XYZ and corrupt it.
Next time, I will look into more complicated hooks.
Why
There are times when you want to change the behavior of a function that you cannot recompile for some reason. For example, there could be a bug in a runtime function of your language. Changing the version of your language is typically quite a challenge and it could be much better to patch that function instead of making drastic changes like switching to a different version.
Or as another example, there could be some 3rd party library that has a bug and the author didn’t provide a source code. Or perhaps you want to do some pre or post-processing. For example, you may want to save some input or output parameters or even modify them.
For further discussion let's assume that the function we want to modify is named XYZ and we need to modify the binary code of this function.
Case 1
Case 1 is for the case when you control when the function you want to patch changes. For example, you have an application that uses a certain C++ runtime and you or your team decides when to switch to a different runtime version. So you control an application and runtime version and there will be no surprises. In this case, you can use Case 1.
It is a very good idea to add code that verifies that the bytes you are about to change are the bytes your code expects. Imagine that XYZ is a function from C++ runtime and somebody will switch to a different version of that runtime and your code is not aware of your change. The XYZ function can be different in the new runtime and most likely you will just corrupt it and it will crash. The worst case if it will only crash sometimes.
Case 2
This is the case when you don’t know when the function you want to change changes. For example, if you are writing a DLL that people can use in their. In this case, you don’t know when the XYZ function will change. This also applies to a case when the XYZ function is a Windows API function as well (more about this later).
How
Normally you cannot write into a code segment because it has only read and execute rights. As a result, you will have to use the specialized function WriteProcessMemory for this. Obviously, you must know assembly language and you need tools to write assembly code.
It is an extremely good idea to leave some comments that explain what your code does. Set of bytes you want to change and disassembly of these bytes. Then new bytes and their disassembly. It will save you a lot of time when you will need to repeat the process a few years later when you want to update to new version of that code.
Examples
Example 1 for Case 1.
Imagine that the fix is very simple and does not require complex changes. This is a case when patching the code of that function can be beneficial. Effectively we can just overwrite a few bytes of that function.
For example, convert conditional jump to unconditional. These instructions have the same length and it will not affect anything. So all you need is to change a single byte in that instruction. Another variant of this case is changing data for that instruction. For example, you need to change 1 to 2 or 0.
Another simple case is to disable conditional jump. In this case, you replace the whole instruction with NOP instructions. NOP is a single-byte instruction and needs to be placed as many times as bytes in the original instruction.
Little bit harder case when you need to add more instructions but there is no space for them. Sometimes you can make space by shortening some instructions. For example, you may see the following code:
add rax, 1
and you can replace it to
inc rax
The INC is slower but one byte shorter. Sometimes there is a loop in the code and the compiler unrolls the loop. You can roll it back. It will be shorter and slower but perhaps it is ok. Sometimes compiler inlines some code that copies data. In this case, you can try to replace it with REP MOVS instruction.
Sometimes there are NOP bytes between the end of the XYZ function and the beginning of the next function. Especially in x64 because it has very strict aligning requirements. Be creative!
Also, remember to add NOP instruction if you modify a single instruction. For example, imagine you have 2 instructions: one is 3 bytes and 1 byte long. Then you want to modify the first one, but the new instruction is just 2 bytes. In this case, you need to add NOP after the second byte, so the second instruction will be interpreted correctly.
Ok, that was an easy case, but most of the time, you simply cannot change the XYZ function to accommodate your changes because they are too big or too complex.
Example 2 for any case.
Calculate how many bytes jump instruction will take on your platform. Keep in mind that for the x64 platform, you will need to take into account that for a big application there is a possibility that will need to have a 64-bit absolute jump and not a 32-bit relative one. And keep in mind that the jump to absolute address in x64 is quite a long instruction.
Assuming that the jump instruction will take X bytes, we then save the first X bytes of binary code of the XYZ function and then overwrite them with code that jumps to your function XYZ_Hook.
XYZ_Hook function must have exactly the same parameters as the XYZ function. If there are any hidden parameters like this you will need to add them in appropriate places or place them in fake class too, but keep in mind that in this case it will have different this.
Also, remember that the XYZ_Hook must not have its own hidden parameters. For example, the XYZ_Hook function is a member of the class you need to make sure that it is static and will not have hidden this parameter (except for the case above).
After that, any call of the XYZ function from any place will transfer control to the XYZ_Hook function. XYZ_Hook will do the necessary pre-processing and then restore overwritten bytes of the XYZ function to restore code to its original state and call the XYZ function. Because now the XYZ function contains the original code, it will execute normally and then return back to the XYZ_Hook function.
After that, XYZ_Hook can do post-processing and then overwrite the first bytes again to restore your hook and return. And because execution arrived into XYZ_Hook function via jump instruction execute will return from XYZ_Hook to the original caller who called XYZ function.
And because the XYZ_Hook function has exactly the same parameters as the XYZ function compiler will generate the correct code to save and restore all registers and everything will work correctly.
While this approach works fine in most cases, there are certain things to watch out for.
Firstly, you must know if any other threads will call this function. If this is possible, then you need to use a critical section because it is quite bad when one thread changes the code of the function that is executing by a different thread. Or when one thread is writing a jump instruction when another thread is restoring original bytes.
Secondly, if XYZ is recursive then you will not be able to catch a recursive call in this case because XYZ_Hook will restore the original bytes. But for Case 1 there is hope and it is possible to fix in some cases.
Thirdly, if this is a Windows API function, Windows can hot patch them. I’m not sure how it works in case bytes are different from what Windows is expecting, but you need to keep this thing in mind if you write service or something critical.
Lastly, you need to have enough bytes in function code to be able to write jump instructions. In some cases, especially in 32-bit code, functions can be packed very tightly and the XYZ function can be very short. For example, it can be a single-byte RET instruction. As a result, you may overwrite the next function after XYZ and corrupt it.
In the next part, I will look into more complicated hooks.