Attacks Explained – Function Detouring

What is Function Detouring?

Function Detouring, sometimes called Function Trampolining, is an interesting reverse engineering technique with a large range of applications. If a piece of software was developed long ago and recent changes to a system have rendered it unusable, function detouring can be used to patch now broken functions to make them stable again. Alternatively, it can be used more maliciously to circumvent security measures by redirecting execution or manipulating memory prior to checks. In semi-rare cases, it can even be a handy tool for unlocking additional debug information that may have been disabled in the final release of a product.

Function Detouring can take many forms and its mostly limited by the imagination of the attacker. A detour can completely overwrite a function to have it perform a different action, bounce part of the function to a segment of blank memory called a code-cave, or even just make minor modifications to existing calls. Applications will almost never use all of their allotted memory and for good reason. This leaves a lot of gaps in the memory where nothing useful is being stored or used. This region of barren, unused memory is called a code-cave. Code-caves are particularly important as they allow the attacker to expand the room available to them as they perform these modifications.

The flow of Function Detouring consists of getting control of memory inside the target, modifying an address to jump to a new location, performing whatever actions the attacker wants, and returning. Let’s say in the middle of a function, an attacker wishes to quickly get a call into evil_function, evil_function2, and evil_function3. In a 32-bit application calls are 5 bytes long in the form `0xE8 0x00 0x00 0x00 0x00` where the last four bytes are the distance to jump from the current address in hexadecimal. All examples in this writeup will be for 32-bit for simplicity, but the same idea applies to 64-bit. In order to create the call instruction necessary to execute our malicious code, we’ll need to overwrite 5 bytes and then fix the stack and registers after the detour finishes.

In this case, our 5 bytes we need to overwrite are `mov ebx, dword ptr[edx]` and `add ebx, 0x8` which totals to exactly 5 bytes worth of instructions. This can be verified via an assembler. After overwriting the bytes with our call instruction to a code-cave we found offscreen, we need to add our calls to the various evil functions. Each of these calls is 5 bytes long and the two instructions we replaced are an additional 5 bytes so we need to write 20 bytes to the code-cave total. After the 20 bytes have been written, we need to write one last `ret` (0xC3). This will return execution flow back to where we placed our original call instruction.

Performing Function Detouring with an Imported Target

In this case, the target is inside the memory region of our application. It’s either the application we’re currently running, a loaded library, or some other imported resource. This case gives us ease of control as we can arbitrarily modify our own application’s code and we dont have to worry about inter-process communication.

To show this in practice, we’ll give a short example where we’ll import the injected.dll library and call its “Execute” function.

Source.cpp

#include <iostream>
#include <Windows.h>

int main()
{
    HMODULE module = LoadLibraryA(“injected.dll”); // Load the library into memory and get its module
    unsigned long addr = (unsigned long)GetProcAddress(module, “Execute”); // Get the address of the “Execute” function
    void (*Execute)() = (void(*)())addr; // Create a function pointer to it

    Execute(); // Call the function
    std::cin.get();
    return 0;
}

In a seperate project we’ll create the .dll file that we’ll import into the primary application. This will be pretty barebones as the added functionality will be handled by the first application.

dllmain.cpp

#include “pch.h”
#include <stdio.h>

void Func1()
{
printf(“Hello from Function 1!”);
}

extern “C” __declspec(dllexport) void Execute() // Export the “Execute” function and only make it call Func1
{
Func1();
}

To perform a detour on Execute and move the execution away from the dll, we first need to make a place for it to go. In our Source.cpp file we’ll add the function “Func2”. We won’t have any explicit calls to this function so by all practical accounts its simply dead code.

Source.cpp

void Func2()
{
printf(“Hello from Function 2!\n”);
printf(“There are no normal calls to me, this was a detour!\n”);
}

Now that we have the function we want to route to, we need to get a few pieces of information:
1. The address of Func2
2. The address of Execute
3. The offset from Execute where Func1 is called
4. The distance between the Func1 call and Func2.

Luckily. since we have full control over the local memory this is all trivial to get. The address of Func2 can be easily obtained by our application by doing `(unsigned long)*Func2`. To get the address of “Execute” we’ll use `FARPROC GetProcAddress(HMODULE hModule, LPCSTR lpProcName);`. GetProcAddress can get the address of an exported function from a given module. Since we defined the function as exported in our .dll via `extern “C” __declspec(dllexport)` we’ll be able to get it. Otherwise, we’d need to get the offset of the function from the base of the application which would require finding the function call in a disassembler and getting the distance from the start of the module. The module we need for GetProcAddress comes from our LoadLibrary call we made when first loading the library. Once all of the that is out of the way we can open the dll in a disassembler and find the offset from “Execute” to where Func1 is called. Our goal here is to replace the call to Func1 with a call to Func2.

Opening the application up in x64dbg and going to the “Execute” function from the injected.dll module we can see the function sitting amongst a large number of other jump instructions. This is the branch table and its what we’re given from the GetProcAddress function. The branch table is used as a lookup table for jumping between functions and shortening call/jmp lengths. For all intents and purpsoes we’ll refer to this as the start of the function as the distance between some instance of “Execute” and our call is all we need. So we’re given an initial address of 0x0F9712AD as the starting address for “Execute”.

The branch table takes us into this block of code which is the body of the “Execute” function. Here we can see the call to Func1 at 0x0F1F16A8 so the offset is 0x0F9716A8 – 0x0F9712AD = 0x3FB. Finally, the jump distance between the Func1 call and our Func2 location can be calculate by addressOfFunc2 – addressOfFunc1Call – 0x5. The 0x5 is important as the distance to jump is based on the end of the call instruction and the call instructions are 5 bytes long. Now that we have all the information we need we can put together the final attack application.

#include <iostream>
#include <Windows.h>

void Func2()
{
printf(“Hello from Function 2!\n”);
printf(“There are no normal calls to me, this was a detour!\n”);
}

unsigned long func1call = addr + 0x3FB; // The location of the call to Func1
unsigned long func2jmp = (unsigned long)*Func2 – func1call – 5; // How far the jump is to Func2

BYTE detour[5] = { 0xE8, 0x00, 0x00, 0x00, 0x00 }; // The bytecode to be written, currently JMP 0x00000000 we’ll overwrite this with our jump distance

    memcpy(detour + 1, &func2jmp, 4); // Copy the distance we need to jump into the last 4 bytes of the bytecode

    DWORD old; // Double WORD (4 bytes), holds the old protection flags

VirtualProtectEx(GetCurrentProcess(), (void*)func1call, 5, PAGE_EXECUTE_READWRITE, &old); // Change the protection at the call’s location to Read/Write/Execute so we can edit it. The default is Read/Execute and trying to write to it will cause a Segmentation Fault.

memcpy((void*)func1call, detour, 5); // Change the bytecode at that location

VirtualProtectEx(GetCurrentProcess(), (void*)func1call, 5, old, &old); // Restore the old protections

Execute(); // Call the Execute function from the DLL

std::cin.get(); // Wait for user input
return 0;
}

Executing the attack we can see our Func2 is successfully called

Performing Function Detouring with an External Target

The second case for detouring is where the target is not in the same memory space as the injector. This means that we somehow need to modify the memory of an external process to do what we require. While there are a handful of ways to do this such as patching from Kernel space, the most common method is via DLL Injection. With DLL Injection we can design a DLL to seek out the specific area we want to detour and make the necessary modifications. Then using the Windows API we’ll use the `CreateRemoteThread` call to load the library into the target’s memory which will perform the necessary modifications. Its important to remember that the reason we need to go through all this trouble is because of the user-mode Virtual Address Space (VAS). Our Injector process and our Target process sit in seperated regions of memory that can not directly manipulate each other. Because of this, we need to rely on some method to bridge the gap in memory which in this case would be the CreateRemoteThread and the loaded malicious DLL.

Our target is fairly straightforward, we wait for user input and then call Func1.

#include <iostream>

void Func1()
{
    std::cout << "Hello from Func1\n";
}

int main()
{
        std::cout << "Press [Enter] to enter Func1!\n";
    std::cin.get();
    Func1();
    std::cin.get();
    return 0;
}

The Injector

In order to perform a DLL injection on the target, we’ll need a few things:
1. A HANDLE to the target process
2. The address of LoadLibraryA
3. A region in Target process’ memory where we can write the path to the DLL
4. The address of our call to Func1
5. The distance from the call to Func1 to Func2

Notice that we’re also doing a small change from the previous example. In this version, Func1 isn’t exported so we wouldn’t be able to use GetProcAddress to find it. Our only options is to get the offset relative to the base of the module rather than relative to some exported function like “Execute” in the previous example.

In order to get the HANDLE of the target process we’ll need to rely on a Windows API call named CreateToolhelp32Snapshot. This function can be used to get a snapshot of any process. In our case, that means we can get the process ID and the HANDLE for any application. We give the function the arguments `TH32CS_SNAPPROCESS` to indicate we’re looking for process entries and `0` to get the current process. Next we use `Process32First` to get the first process in the heirarchy and we iterate through all running processes with `Process32Next`. Each “szExeFile” is compared against our “Target_Application.exe” until we get a match. That process ID is then taken and we open a handle to it with OpenProcess.

#include <Windows.h>
#include <iostream>
#include <tlhelp32.h>

int main()
{
    PROCESSENTRY32 pe32;
    HANDLE hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    Process32First(hProcessSnap, &pe32);
    HANDLE hProc = NULL;
    int pid;
    while (Process32Next(hProcessSnap, &pe32))
    {
        if (strcmp(pe32.szExeFile, “Target_Application.exe”) == 0)
        {
            hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pe32.th32ProcessID);
            pid = pe32.th32ProcessID;
        }
    }

CloseHandle(hProcessSnap);
return 0;
}

Next we need the address of LoadLibraryA so that we can create a remote thread inside our target to it. That will force the target to load a library of our choice which is our evil dll in this case. LoadLibraryA comes from kernel32.dll so we’ll pass that as part of GetProcAddress to get the address to it and we’ll save it as a LPVOID also known as a void* (a pointer to some data, no type given). VirtualAllocEx will allocate some memory inside our target that we will use to store the path to our injected dll. We’ll write the string to the allocated space with WriteProcessMemory and finally we’ll create out remote thread with CreateRemoteThread.

#include <Windows.h>
#include <iostream>
#include <tlhelp32.h>

CloseHandle(hProcessSnap);
const char* dllname = “C:\\Users\\Xorus\\source\\repos\\Detouring_External\\Debug\\Injected_DLL.dll”;

    SECURITY_ATTRIBUTES sec;
    LPVOID addr_LoadLibrary = GetProcAddress(GetModuleHandleA(“Kernel32.dll”), “LoadLibraryA”); // Get the address of LoadLibraryA
    LPVOID addr_Text = VirtualAllocEx(hProc, NULL, strlen(dllname), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); // Allocate some memory (the size of our string) in the target process, reserve and zero it out,
    WriteProcessMemory(hProc, addr_Text, dllname, strlen(dllname), NULL);
    CreateRemoteThread(hProc, NULL, 0, (LPTHREAD_START_ROUTINE)addr_LoadLibrary, addr_Text, NULL, NULL);

return 0;
}

Now that our Injector is finished we need to set up the DLL that we’ll be injecting into the target. Here the code can get quite a bit crunchier so don’t worry about having to go over it a few times if needed.

The DLL

We won’t be going through all the steps of creating a DLL, however you can find a guide linked here. We’ll start off with by creating our Func2 like we had in the previous example. In this version we’ll export it for simplicity sake since in a realisitic scenario this is something an attacker, such as ourselves, can normally control.

extern “C” _declspec(dllexport) void Func2()
{
std::cout << "Hello from Func2!\n";
std::cout << "This is from the detoured DLL, there are no calls to me normally!\n";
}

As mentioned a bit ago, we aren’t able to use GetProcAddress to find our function we want to detour so we’ll have to work off the module base. Whenever a process is run all of the different parts such as the main executable, the loaded libaries, and any resources get pulled into memory. We can use the address that the “target_application.exe” module is loaded at to find any part of its code as all parts of its code will be at a fixed distance (offset) away from it. To get the address of the module we’ll use EnumProcessModules to get all of the modules, GetModuleFileNameEx to get the module name per listed module, and GetModuleInformation to get the base address of the “target_application.exe” module.

BOOL APIENTRY DllMain( HMODULE hModule,
                     DWORD ul_reason_for_call,
                     LPVOID lpReserved
                     )
{
    // Declare our variables before thw switch statements
    unsigned long addr = (unsigned long)& Func2;
    unsigned long baseaddr;
    unsigned long addr_func1call;

    HANDLE hProc;
    HMODULE hModules[1024];
    DWORD lpcbNeeded;
    MODULEINFO lpmodinfo;
    TCHAR szModName[MAX_PATH];
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        std::cout << "Attached!\n";
        hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCurrentProcessId()); // Get the Target’s HANDLE
        EnumProcessModules(hProc, hModules, sizeof(hModules), &lpcbNeeded); // Get all of the Modules, put them in hModules
        for (int i = 0; i < (lpcbNeeded / sizeof(HMODULE)); ++i) // For each of the Modules
        {

            GetModuleFileNameEx(hProc, hModules[i], szModName, sizeof(szModName) / sizeof(TCHAR)); // Get the Module’s file name
            if (lstrcmpW(szModName, L”.exe”) != -1) // Does it contain “.exe”?
            {

                GetModuleInformation(hProc, hModules[i], &lpmodinfo, sizeof(lpmodinfo)); // Get the Module information
                baseaddr = (unsigned long)lpmodinfo.lpBaseOfDll; // Get the base address of Target_Application.exe
                std::cout << "Base: 0x" << std::hex << baseaddr << std::endl; // Output
                addr_func1call = baseaddr + 0x12610; // The address for the call to Func1 is 0xE92610, the base address is 0xE80000. That means the offset is 0xE92610 – 0xE80000 = 0x12610

                std::cout << "call target_application.11107D: 0x" << std::hex << addr_func1call << std::endl;
                break;
            }
        }
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

Now that we have the address, we can add our overwrite code like in the first example. First we’ll calculate the jump from our call to Func1 to our Func2 function. Then we’ll make a call to that from bytecode: 0xE8 0x00 0x00 0x00 0x00 where the last 4 bytes are the distance to jump from that call to Func1. The area to be overwritten needs to have its permissions modified by VirtualAllocEx, but after that everything will be ready to go and the code should execute Func2.

// dllmain.cpp : Defines the entry point for the DLL application.
#include “pch.h”
#include <iostream>
#include <Psapi.h>
extern “C” _declspec(dllexport) void Func2()
{
    std::cout << "Hello from Func2!\n";
    std::cout << "This is from the detoured DLL, there are no calls to me normally!\n";
}
BOOL APIENTRY DllMain(HMODULE hModule,
    DWORD ul_reason_for_call,
    LPVOID lpReserved
)
{
    unsigned long addr = (unsigned long)& Func2;
    unsigned long baseaddr;
    unsigned long addr_func1call;
    unsigned long jmp;
    DWORD old;
    BYTE overwrite[5] = { 0xE8, 0x00, 0x00, 0x00, 0x00 };
    HANDLE hProc;
    HMODULE hModules[1024];
    DWORD lpcbNeeded;
    MODULEINFO lpmodinfo;
    TCHAR szModName[MAX_PATH];
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        std::cout << "Attached!\n";
        hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCurrentProcessId());
        EnumProcessModules(hProc, hModules, sizeof(hModules), &lpcbNeeded);
        for (int i = 0; i < (lpcbNeeded / sizeof(HMODULE)); ++i)
        {
            GetModuleFileNameEx(hProc, hModules[i], szModName, sizeof(szModName) / sizeof(TCHAR));
            if (lstrcmpW(szModName, L”.exe”) != -1)
            {
                GetModuleInformation(hProc, hModules[i], &lpmodinfo, sizeof(lpmodinfo));
                baseaddr = (unsigned long)lpmodinfo.lpBaseOfDll;
                std::cout << "Base: 0x" << std::hex << baseaddr << std::endl;
                addr_func1call = baseaddr + 0x12610;
                std::cout << "call target_application.11107D: 0x" << std::hex << addr_func1call << std::endl;
                jmp = addr – addr_func1call – 5;
                std::cout << "Jump distance: 0x" << std::hex << jmp + 5 << std::endl;
                std::cout << "Jump destination: 0x" << std::hex << addr_func1call + (jmp + 5) << std::endl;
                std::cout << "Func2 location: 0x" << std::hex << addr << std::endl;
                break;
            }
        }
        VirtualProtectEx(GetCurrentProcess(), (LPVOID)addr_func1call, 5, PAGE_EXECUTE_READWRITE, &old);
        memcpy(overwrite + 1, &jmp, 4);
        memcpy((LPVOID)addr_func1call, overwrite, 5);
        std::cout << "Detoured: " << "0x" << std::hex << addr_func1call << std::endl;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

Execution of Func2, just as expected. This methodology can be applied to tons of different applications and use cases. Anything from bypassing application security to installing additional functionality can be done through the usage of Function Detouring. The major limiting factors only being the elevation of the process, potential Windows blocking hooks, live checksum detection, and portability. Aside from these issues, Function Detouring makes an essential technique for any reverse engineer, exploit developer, game hacker or other similar skillset.