->*

Challenge 4 ("UnholyDragon")

Description

This is the point in our story where the hero purges the world of the dragon's corruption. Except that hero is you, so you will probably fail.

We are given a file named UnholyDragon-150.exe. Given the exe extension, my first course of action was to open it with CFF explorer and get some basic info about the PE file. Surprisingly, the program reported that the file under scrutiny was not of the PE format, so I checked with a hex editor and found the culprit: Instead of the trusty MZ signature at offset 0, the M byte (0x4d) was replaced with 0x15. Reverting this modification is enough for the PE file to be successfully loaded and run.

Running the 32-bit EXE in a sandbox, it seemed to have created and run several copies of itself in the same directory, from UnholyDragon-151.exe to UnholyDragon-154.exe. Through experiment, I found that whatever the number at the end of the filename, the sample replicated itself with this number incremented (or simply added -1 if there was no number), and the executed the copy with CreateProcessW. However, the dropped files were not perfect copies, as their hashes differed each time. I analyzed the differences using a hex editor and found that each pair of parent and child executables differed in exactly one byte somewhere in the file. This explained why the execution chain would not go forever, as eventually, some crucial data or code was overwritten that would prevent the binary from working correctly. This, combined with the fact that our (un)executable was named UnholyDragon-150.exe, also explains the invalid MZ header in the original sample — likely the execution started with UnholyDragon.exe or UnholyDragon-1.exe and our sample was the first one that was corrupted enough not to be executed.

While it was not at all clear at this stage what the goal of the challenge was, it seemed logical to try to figure out how these one-byte changes were chosen by the executable. There was of course the risk that one of the previous samples overwrote some part of this logic, but at this point, it made sense to ignore that possibility.

To make it clear that me and the reader are on the same page, I will assert that the file being analyzed from here on is the one with SHA-256 9dd5342639eca125de115599f925d569a430c32ee1fcb38eeca3c685ee208bec, i.e. the one obtained from the archive with the first byte reverted back to 0x4d, effectively restoring UnholyDragon-149.exe (under the hypothesis describer earlier).

Based on some strings found and referenced in the binary, I identified the compiler as twinBASIC, which I unsurprisingly have never heard of before. It is a closed-source compiler and in its current state does not produce any PDBs or other artifacts helpful for debugging during the executable emission. I don't expect twinBASIC to be a self-hosted language, so it may have been easier to analyze the compiler binaries, which are available for download on GitHub, than to analyze the emitted binary itself, however, I did not do this.

BASIC code compiled using twinBASIC has its quirks, like excessive use of Variant and similar types from the oleaut32.dll, which are however not unexpected for a managed, dynamically-typed language. One omnipresent and probably the most difficult such quirk to deal with was local variable addressing. The twinBASIC compiler does not use stack frames in the traditional C sense, where a stack frame is bounded by ebp from the bottom and esp from the top, but rather stores a global stack base pointer in thread-local storage and all local variables (across frames) are addressed relative to this base pointer. For this reason, I used mainly dynamic analysis, stepping through instructions with a debugger and annotating Hex-Rays decompiler output with comments rather than local variable names and types.

Following the control flow from the entry point, it seemed the closest equivalent to a user-defined "main" function was being called at preferred virtual address (PVA) 0x40265D.

.text:00402652 mov     eax, offset sub_4A9C4E
.text:00402657 push    offset dword_4AFFD0
.text:0040265C push    eax
.text:0040265D call    main

This function takes two arguments, one of which is the address of another function (sub_4A9C4E), which is then called early on in main. My guess is that it may be a runtime-specific way of calling custom initialization code, but I did not investigate to back this up. In any case, analyzing it is key for our sample.

Apart from sub_4A9C4E, two other functions of interest are called in main. They are sub_478005, with the argument "StartupShow", which seems to be responsible for creating the window for the GUI form, and the function at 0x48077B, which I called run_event_loop.

.text:00402552                 mov     ecx, [esp+20h+var_14]
.text:00402556                 mov     edx, offset aStartupshow ; "StartupShow"
.text:0040255B                 call    sub_478005               ; window shows up
.text:00402560                 xor     dl, dl
.text:00402562                 xor     ecx, ecx
.text:00402564                 call    run_event_loop           ; GUI program runs

As it turns out, neither of these two functions are of particular significance to the challenge; the interesting code lies completely within sub_4A9C4E.

The first interesting thing this function does is the allocation and initialization of a PRNG object, which will become important later (at this point in the analysis, I of course didn't know that it was a PRNG).

.text:004A9C83                 lea     eax, [esp+50h+ptr_prng_struct]
.text:004A9C87                 push    eax             ; int
.text:004A9C88                 push    98h             ; dwBytes
.text:004A9C8D                 call    alloc_zeroed_prng_object_mem
.text:004A9C92                 test    eax, eax
.text:004A9C94                 mov     cl, 0
.text:004A9C96                 jns     short loc_4A9C9D
.text:004A9C98                 jmp     loc_4A7717
.text:004A9C9D ; ---------------------------------------------------------------------------
.text:004A9C9D loc_4A9C9D:
.text:004A9C9D                 mov     eax, [esp+50h+ptr_prng_struct] ; initialization of prng
.text:004A9CA1                 mov     dword ptr [eax+10h], 1
.text:004A9CA8                 mov     ecx, offset prng_vtable
.text:004A9CAD                 mov     [eax], ecx
.text:004A9CAF                 mov     ecx, offset off_4B0620
.text:004A9CB4                 mov     [eax+edi*4-7Ch], ecx
.text:004A9CB8                 mov     ecx, offset off_4B0634
.text:004A9CBD                 mov     [eax+edi*4-78h], ecx
.text:004A9CC1                 mov     ecx, offset off_4B06D4
.text:004A9CC6                 mov     [eax+edi*4-74h], ecx

The next significant events are the following two calls, and particularly the latter.

.text:004A9CF0                 push    0Eh
.text:004A9CF2                 push    8
.text:004A9CF4                 push    [esp+64h+var_5C]
.text:004A9CF8                 call    try_to_deserialize_form
.text:004A9CFD                 test    eax, eax
.text:004A9CFF                 mov     cl, 1
.text:004A9D01                 js      short loc_4A9C98
.text:004A9D03                 push    [esp+5Ch+prng_seed_str]
.text:004A9D07                 push    [esp+60h+var_60]
.text:004A9D0A                 call    drops_new_exe
.text:004A9D0F                 test    eax, eax

The drops_new_exe function, as I have named it, is over 3 kB in size, illustrated by the following disassembly graph overview:

Disassembly control flow graph overview in IDA

Luckily, stepping through the code with a debugger and paying attention to the function calls, it is possible to recover a lot of the semantics rather quickly. The functions called at 0x4A8F94 and 0x4A8FC6 return the directory and basename of the process executable respectively, and the various calls to VarBstrCat (cat being common programmer slang for concatenate), VarBstrFromI4 or SysFreeString are self-explanatory.

The first truly observable effect of the program's execution occurs by means of the function called at 0x4A96DD, which I've dubbed simply copy_file. This function resolves both src and dst argument paths to their canonical forms and then calls CopyFileW from kernel32.dll.

.text:004A96D5 push    dword ptr [esp+258h+lpFileName+34h] ; dst
.text:004A96D9 push    dword ptr [esp+25Ch+lpFileName+2Ch] ; src
.text:004A96DD call    copy_file

Indeed, as you can verify with a debugger, after copy_file returns, a new executable is dropped to the current directory, with the number in the filename incremented. At this point, it is an identical copy of the binary being run. Next, the copied file is opened by means of a call to sub_4A59CA and then its filesize is retrieved (using the standard seeking trick) by a function I've named get_file_size (sub_4A5DE7).

.text:004A973A push    0
.text:004A973C push    32
.text:004A973E push    5
.text:004A9740 push    OPEN_ALWAYS
.text:004A9742 push    FILE_SHARE_READ | FILE_SHARE_WRITE
.text:004A9744 push    GENERIC_READ | GENERIC_WRITE
.text:004A9749 push    128
.text:004A974E push    dword ptr [esp+27Ch+bstrRight+4]
.text:004A9752 push    [esp+280h+bstrLeft] ; filepath
.text:004A9756 call    sub_4A59CA
...
.text:004A9778 lea     eax, [esp+edi*8+25Ch+lpFileName+10h]
.text:004A977C push    eax             ; outSize
.text:004A977D mov     dx, word ptr [esp+260h+bstrLeft]
.text:004A9782 push    edx             ; hFile
.text:004A9783 call    get_file_size

Next, skipping over a function call that seemed wholly mysterious for now...

.text:004A97B6 push    dword ptr [esp+264h+some_var+4]
.text:004A97BA push    dword ptr [esp+268h+some_var]
.text:004A97BE call    sub_4A5E4C

we arrive whither we set out: the code that actually modifies the copy.

.text:004A97E2 lea     eax, [esp+edi*8+260h+random_pos]
.text:004A97E6 push    eax                              ; out
.text:004A97E7 push    [esp+264h+filesize]              ; hi
.text:004A97EB push    1                                ; lo
.text:004A97ED push    dword ptr [esp+26Ch+prng]        ; prng
.text:004A97F1 call    prng_next_int
...
.text:004A9816 push    esp
.text:004A9817 push    1                                ; NumberOfBytesRead
.text:004A9819 push    dword ptr [esp+26Ch+random_pos]  ; offset_plus_1
.text:004A981D push    [esp+270h+fd_self]               ; fd
.text:004A9821 call    file_seek
...
.text:004A982C push    1                                ; count
.text:004A982E lea     eax, [esp+edi*4+268h+og_byte]
.text:004A9832 push    eax                              ; buffer
.text:004A9833 push    dword ptr [esp+26Ch+fd_self]     ; fd
.text:004A9837 call    file_read_buffered
...
.text:004A985F lea     eax, [esp+edi*8+264h+random_xor_byte]
.text:004A9863 push    eax                              ; out
.text:004A9864 push    0FFh                             ; hi
.text:004A9869 push    1                                ; lo
.text:004A986B push    dword ptr [esp+270h+prng]        ; prng
.text:004A986F call    prng_next_int
...
.text:004A987A mov     eax, dword ptr [esp+edi*8+264h+random_xor_byte]
.text:004A987E xor     eax, dword ptr [esp+264h+og_byte]
...
.text:004A9890 push    eax                              ; writes var_268
.text:004A9891 mov     dl, byte ptr [esp+268h+var_268]
.text:004A9894 mov     [esp+edi*4+268h+new_byte], dl
...
.text:004A98A7 push    esp                              ; int
.text:004A98A8 push    0                                ; NumberOfBytesRead
.text:004A98AA push    dword ptr [esp+26Ch+random_pos]  ; offset_plus_one
.text:004A98AE push    [esp+270h+fd_copy]               ; fd
.text:004A98B2 call    file_seek
...
.text:004A98BD push    1                                ; count
.text:004A98BF lea     eax, [esp+edi*4+268h+new_byte]
.text:004A98C3 push    eax                              ; buffer
.text:004A98C4 push    dword ptr [esp+26Ch+fd_copy]     ; fd
.text:004A98C8 call    file_write_buffered

which we can simplify and rewrite in pseudo-C as roughly the following:

uint32_t random_pos = prng_next_int(&prng, 1, filesize);
file_seek(fd_self, random_pos);

uint8_t og_byte;
file_read(fd_self, &og_byte, 1);

uint8_t random_xor_byte = prng_next_int(&prng, 1, 255);
uint8_t new_byte = og_byte ^ random_xor_byte;
file_write(fd_copy, &new_byte, 1);

Now, assuming the PRNG is in itself deterministic and we can reverse-engineer the seed, we could reconstruct the modifications that produced our sample and hopefully arrive at the flag. It's therefore time to analyze the functions I have called prng_next_int and prng_set_seed more deeply.

PRNG algorithm analysis

Analyzing the prng_next_int function reveals that the PRNG is based on floating-point arithmetic pseudo-randomness. Internally, it calls prng_next_float (sub_4A79EB), which returns a single-precision float between 0 and 1, and uses it to linearly interpolate between the lo and hi arguments (both are inclusive).

Rather than to illustrate the workings of the prng_next_float function directly from the disassembly or decompiler output, I will instead show my attempt at reimplementing it in C from scratch. Unfortunately, it was an unsuccessful attempt, as my outputs at some point begin diverging from those produced by the Dragon, and I am still not quite sure why. As you will see later, I have ultimately decided for a different solution strategy.

#include <assert.h>
#include <math.h>
#include <stdio.h>
#include <stdint.h>

static const double g_prng_float_multiplier = 1.103515245e9;
static const double g_prng_float_offset     = 12345.0;
static const double g_prng_float_divisor    = 4.294967296e9;

/**
 * Advances the PRNG state and produces the next normalized float.
 *
 * @param state Pointer to current PRNG state (double at offset 0x74)
 * @param out   Pointer to float to receive the [0,1) random result
 */
void prng_next_normalized_float(double *state, float *out)
{
    // Update internal state.
    *state = (*state * g_prng_float_multiplier) + g_prng_float_offset;
    // printf("[dbg] state 1: %le\n", *state);

    double v9 = *state;

    *state /= g_prng_float_divisor;
    // printf("[dbg] state 2: %lf\n", *state);

    double int_part = floor(*state);
    double factor = dbl_reinterpret_from_int(0x41F0000000000000ull);

    // printf("[dbg] (%.21le - %.21le) = %.21le\n", v9, int_part * factor, v9 - (int_part * factor));
    *state = (v9 - int_part * factor);
    // printf("[dbg] state 3: %le\n", *state);

    double tmp = *state / 65536.0;
    double tmp_int_part = floor(tmp);
    // printf("[dbg] int part: %lf\n", tmp_int_part);

    *out = tmp_int_part / 32768.0;
}

void prng_next_int_in_range(double *state, int lo, int hi, int *out)
{
    float prng_factor;
    prng_next_normalized_float(state, &prng_factor);
    // printf("[dbg] prng_factor = %.15lf (%x)\n", prng_factor, *(uint32_t*)&prng_factor);

    int range = hi - lo + 1;

    *out = (int)((double)range * prng_factor + (double)lo);
}

double dbl_reinterpret_from_int(uint64_t value)
{
    return *(double *)&value;
}

double flt_reinterpret_from_int(uint32_t value)
{
    return *(float *)&value;
}

int test(void)
{
    double state = 26576.0;
    float result;

    printf("prng_next_normalized_float(%.21lf):\n", state);

    prng_next_normalized_float(&state, &result);

    printf("  result = %.21lf (got)\n", result);
    printf("  result = %.21lf (expected)\n", flt_reinterpret_from_int(0x3EEAB400u));
    printf("  state  = %.21lf (got)\n", state);
    printf("  state  = %.21lf (expected)\n", dbl_reinterpret_from_int(0x41CD56E1E4800000ull));
    printf("\n");

    printf("prng_next_normalized_float(%.21lf):\n", state);

    prng_next_normalized_float(&state, &result);

    printf("  result = %.21lf (got)\n", result);
    printf("  result = %.21lf (expected)\n", flt_reinterpret_from_int(0x3F483200u));
    printf("  state  = %.21lf (got)\n", state);
    printf("  state  = %.21lf (expected)\n", dbl_reinterpret_from_int(0x41D90672C0000000ull));
    printf("\n");

    return 0;
}

Output:

prng_next_normalized_float(26576.000000000000000000000):
  result = 0.458404541015625000000 (got)
  result = 0.458404541015625000000 (expected)
  state  = 984466377.000000000000000000000 (got)
  state  = 984466377.000000000000000000000 (expected)

prng_next_normalized_float(984466377.000000000000000000000):
  result = 0.782012939453125000000 (got)
  result = 0.782012939453125000000 (expected)
  state  = 1679411840.000000000000000000000 (got)
  state  = 1679411968.000000000000000000000 (expected)

This test case, directly taken from the Dragon, demonstrates that one iteration produces the correct result, yet the next one fails by an ever so slight margin. The question why my code is wrong is left as an excercise to the reader; if you crack it, please let me know and I will be happy to validate your answer. ;)

In any case, the RNG does seem to be fully deterministic (and is therefore by definition a PRNG). As the code also illustrates, the internal state of the PRNG is made up of only one double-precision floating point number, which is updated in a LCG-like manner. The initial state is the seed, which we should now try to find.

Finding the PRNG seed

Since I knew that the seed was going to be written to offset 0x74 of the PRNG structure, I could simply set a watch or a hardware breakpoint on that location and see what writes to it. That's how I identified the "mysterious" sub_4A5E4C function to be responsible for setting the seed and renamed it to prng_set_seed.

.text:004A97A9 mov     eax, 6746h
.text:004A97AE xor     eax, [esp+25Ch+bstrLeft]
.text:004A97B1 push    eax
.text:004A97B2 push    dword ptr [esp+edi*8+260h+lpFileName+48h]
.text:004A97B6 push    dword ptr [esp+264h+some_var+4]
.text:004A97BA push    dword ptr [esp+268h+some_var]
.text:004A97BE call    prng_set_seed

Since the disassembly (and decompiler output) is incredibly confusing, it again makes sense to employ dynamic analysis to figure out how the seed is set. Set a breakpoint at 0x4A97BE and inspect the arguments on top of the stack:

stack:0019FC78 0091A348 -> offset prng_vtable
stack:0019FC7C 000067D0

The first argument clearly points to the PRNG object instance, while the other looks like an integer. At this point, I noticed that the argument 0x67D0 looked rather than the xor operand a few instructions back, 0x6746. Clearly, if you xor these together again, you get some rather small number. And indeed, 0x67D0 ^ 0x6746 = 0x0096, or in decimal, 150. What a coincidence!

As it turns out, it really is as simple as seeding the PRNG with the number suffix of the current executable's name. This integer is xored with 0x6746, converted to a double, and written into the PRNG state variable.

The puzzle is now complete, all that is left is to put it all together.

Finding UnholyDragon-1

As I mentioned earlier, my initial plan was to reimplement the PRNG in from scratch and simulate the program's behaviour without touching it. But since my implementation failed in such a way that I wasn't sure where the problem could have lied, and I was afraid that if I were to fix it, I might be forced to reverse engineer more variant-related twinBASIC functions, I decided to go with a more hacky, yet still efficient way of extracting the pseudorandom offsets and xor keys.

This is a technique I like to use from time to time and which I'm calling the autodebugger. The idea is to write a C/C++ program that uses the Windows API to debug processes programatically. I'm sure there are some ways to do this with x64dbg or similar software, but this requires you to learn yet another set of APIs or commands, and while I don't know these, I do know the Windows API.

The plan is hence the following:

Start with a base UnholyDragon.exe file (still the same fixed one from the archive).
For 1..149:
- Create a copy with that number and run it under a debugger.
- Set a breakpoint at both prng_next_int call sites and retrieve the results.
- Record the PRNG outputs.
Apply all of the transformations in reverse order to get the original UnholyDragon-1.exe.

I wrote the autodebugger in C:

#include <stdio.h>
#include <stdint.h>
#include <windows.h>
#define eprintf(...) fprintf(stderr, __VA_ARGS__)

// Address of instruction after first prng_next_int call.
uint32_t BREAK_ADDR_1      = 0x4a97f6;
// The offset relative to ESP where the result will be.
int32_t  READ_ESP_OFFSET_1 = 0x22c;
// The original instruction byte (will be overwritten by the breakpoint).
uint8_t  RESTORE_1         = 0x85;

// Address of instruction after second prng_next_int call.
uint32_t BREAK_ADDR_2      = 0x4a9874;
// Second offset relative to ESP where the result will be.
int32_t  READ_ESP_OFFSET_2 = 0x234;

int main()
{
    char name[256];
    ZeroMemory(name, sizeof name);

    for (unsigned i = 1; i < 150; i++)
    {
        DWORD ch_offset, ch_xor;

        sprintf(name, "UnholyDragon-%u.exe", i);
        if (!CopyFileA("UnholyDragon.exe", name, FALSE))
            return eprintf("CopyFileA\n"), 1;

        STARTUPINFOA si;
        ZeroMemory(&si, sizeof(si));
        si.cb = sizeof(STARTUPINFOA);

        PROCESS_INFORMATION pi;
        ZeroMemory(&pi, sizeof(pi));

        if (
            !CreateProcessA(name, NULL, NULL, NULL, FALSE, DEBUG_PROCESS | CREATE_SUSPENDED, NULL, NULL, &si, &pi)
            || !WriteProcessMemory(pi.hProcess, (LPVOID)BREAK_ADDR_1, "\xcc", 1, NULL)
            || !WriteProcessMemory(pi.hProcess, (LPVOID)BREAK_ADDR_2, "\xcc", 1, NULL)
            || ResumeThread(pi.hThread) == (DWORD)-1
        )
            return 1;

        DEBUG_EVENT ev;
        ZeroMemory(&ev, sizeof ev);

        CONTEXT ctx;
        ZeroMemory(&ctx, sizeof ctx);
        ctx.ContextFlags = CONTEXT_ALL;

        while (1)
        {
            if (!WaitForDebugEventEx(&ev, INFINITE))
                return 1;

            if (
                ev.dwDebugEventCode != EXCEPTION_DEBUG_EVENT
                || ev.u.Exception.ExceptionRecord.ExceptionCode != EXCEPTION_BREAKPOINT
            ) {
                ContinueDebugEvent(ev.dwProcessId, ev.dwThreadId, DBG_EXCEPTION_NOT_HANDLED);
                continue;
            }

            EXCEPTION_RECORD *exception = &ev.u.Exception.ExceptionRecord;
            if (exception->ExceptionAddress == (PVOID)BREAK_ADDR_1)
            {
                if (
                    !WriteProcessMemory(pi.hProcess, (LPVOID)BREAK_ADDR_1, &RESTORE_1, 1, &written)
                    || !GetThreadContext(pi.hThread, &ctx)
                )
                    return 1;

                DWORD result_address = ctx.Esp + READ_ESP_OFFSET_1;
                ctx.Eip--;

                if (
                    !ReadProcessMemory(pi.hProcess, (LPVOID)result_address, &ch_offset, sizeof ch_offset, &read)
                    || !SetThreadContext(pi.hThread, &ctx)
                )
                    return 1;

                ContinueDebugEvent(ev.dwProcessId, ev.dwThreadId, DBG_CONTINUE);
            }
            else if (exception->ExceptionAddress == (PVOID)BREAK_ADDR_2)
            {
                if (!GetThreadContext(pi.hThread, &ctx))
                    return 1;

                DWORD result_address = ctx.Esp + READ_ESP_OFFSET_2;

                if (!ReadProcessMemory(pi.hProcess, (LPVOID)result_address, &ch_xor, sizeof ch_xor, &read))
                    return 1;

                TerminateProcess(pi.hProcess, 0);
                break;
            }
            else
                ContinueDebugEvent(ev.dwProcessId, ev.dwThreadId, DBG_EXCEPTION_NOT_HANDLED);
        }

        printf("%s %x %x\n", name, ch_offset, ch_xor);

        CloseHandle(pi.hThread);
        CloseHandle(pi.hProcess);
    }
}

And the patcher in Python:

DRAGON_BIN = "..."
DIFFS = [
    (0x27330, 0x76),
    (0x157cf2, 0xe6),
    ...,
    (0x1430b2, 0xd2),
    (0x1, 0x58),
]

with open(DRAGON_BIN, "rb") as f:
    bin = bytearray(f.read())

for offset_plus_one, xor_byte in DIFFS[::-1]:
    bin[offset_plus_one - 1] ^= xor_byte

with open("OGDragon.exe", "wb") as f:
    f.write(bin)

To test out the restored binary, I ran it in a VM sandbox to see what it does. Delightfully, I was greeted with the flag being deserialized as part of the form data and displayed in the GUI.

Flag