Transition from 32 bit to 64 bit

About 5 years ago, we moved our main application to 64-bit. And we quickly discovered one pattern that was everywhere and in many different languages: converting pointers to 4-byte integers and back. Here is one of such examples is C++ code:

void* address;
…
(void*)(((unsigned int)address) + 1)

It was working in 32-bit systems because pointer and integers are exactly the same size and there is no data loss. But during transition to 64 bits, decision was made to keep integer 4 bytes in size. At least on systems I know of.

As result, if you run code above for address that is greater than max int, you will get invalid address. But until address is in first 4 gigabytes of memory, everything works fine. Just as example, imagine you have address that contains 0xAB_1234_5678. If you call an expression above, you are expecting to get an address 0xAB_1234_5679, but you will receive 0x1234_5679. But if address was 0x1234_5679 then everything will continue to work just fine.

But we have quite big application that allocated a lot of memory and, and eventually application will crash, but it was hard to reproduce this issue in a reasonable time. Moreover, application could work even addresses are truncated. Imagine that address 0xAB_1234_5678 contains boolean variable that are typically true. There are 255 out of 256 chances that it will be true at address 0x1234_5679. As result application can work just fine.

As another example, imagine that address 0xAB_1234_5678 contains some structure and 0x1234_5679 contains some rarely used data. So even writing to address 0x1234_5678 could be successful and you will not see any issue.

We were lucky and we discovered few of these crashes way before we were releasing our product. Moreover, we discovered that code above was in a lot of places and in a many languages. Obviously, we cannot find all these places by just looking at code as we have many millions lines of code. We employed code analyzers and discovered and fixed quite a lot of them, but even after that we were still finding bugs. And some of them in 3rd party libraries that we don’t have source code. We need better solution.

After some internal discussion, we decided to employ the same technique as Microsoft did when they introduced 32-bit systems. Microsoft disabled any access for addresses from 0x to 0xFFFF. And we decided to do the same but 64 bits code. Following code does the same:

auto pageSize = 1024 * 1024;
__int64 memSize = (__int64)1024 * 1024 * 1024 * 4;
for (__int64 address = 0; address < memSize; address+= pageSize)
{
    VirtualAlloc((LPVOID)address, pageSize, MEM_RESERVE, PAGE_NOACCESS);
}

Please note, it is not production ready code as I cannot copy and paste it. So, a person I know wrote it for me and it works. Just check it, just in case.

It is not bullet-proof solution because stack and main executable will still be loaded in that lower memory. There is not much can be done about stack, but you can change image base address for main executable. You can do at properties of linker for C++ projects or via following command line:

editbin.exe /REBASE:BASE=0x1500000000 "MyApp.exe"

Also make sure that it does not use any non-Windows DLLs.

Stack is usually from 1 to 4 megabytes in size and usually only small portion of that memory is used and chances are quite low that after pointer truncation, application will hit stack. Chances are from 1 to 1000 to 1 to 4000 if all stack is allocated. Normally chances could be 1 to 40000.

After that, we were fixing crashes for few months and eventually fixed all of them and there only few crashes in very rarely used code that unfortunately made into release. But we fixed them promptly.

Bonus chapter

Soon after product release, we start getting crash reports related to printer drivers. Even we fixed all our code, but during printing or setting up for printer, Windows will load printer drivers into our process. And a lot of printer drivers had this issue. Most of them already resolved this issue in latest version but many companies used outdated drivers. It takes some time to convince clients that problem is in printer driver, because it works everywhere else.

And today I got another such case. This case was slightly different, because that driver does address truncation before it sets window procedure. As result when user will try to bring properties for that printer, execute will jump to code that has NOACCESS and Windows will terminate process. It was less obvious, so I wrote small app that will reserve first 4 Gb in any application and then will crash. Here is source code for Visual Studio:

    if (argc == 0)
    {
        std::cout << "Please pass process id as argument\n";
        return 1;
    }

    int processId = atoi(argv[1]);
    if (processId == 0)
    {
        std::cout << "Cannot parse process id\n";
        return 1;
    }

    HANDLE processHandle = OpenProcess(PROCESS_VM_OPERATION, FALSE, processId);
    if (processHandle == 0)
    {
        std::cout << "Cannot open process id\n";
        return 1;
    }

    __try
    {
        auto pageSize = 1024 * 1024;
        __int64 memSize = (__int64)1024 * 1024 * 1024 * 4;
        for (__int64 address = 0; address < memSize; address+= pageSize)
        {
            VirtualAllocEx(processHandle, (LPVOID)address, pageSize, MEM_RESERVE, PAGE_NOACCESS);
        }
    }
    __finally
    {
        CloseHandle(processHandle);
    }
    std::cout << "Done!\n";

Please compile it as 64 bits and pass Process Id of process from Task Manager as parameter.

I hope it helps someone.