Challenge 5 ("ntfsm")


Description

I'm not here to tell you how to do your job or anything, given that you are a top notch computer scientist who has solved four challenges already, but NTFS is in the filename. Maybe, I don't know, run it in windows on an NTFS file system?

Writeup

We are given a file named ntfsm.exe. The description (and name of both the challenge and executable) mentions NTFS, Microsoft's New Technology File System, the default filesystem on Windows. One particular feature of NTFS that stands out and is a classic Windows forensics quirk are so-called alternate data streams (ADS). This is a feature that allows multiple streams of data to be associated with a single filename. My first idea was thus to check the executable for any alternate file streams, but there were none.

PS> Get-Item -Stream * -Path ntfsm.exe

PSPath        : Microsoft.PowerShell.Core\FileSystem::C:\...\ntfsm.exe::$DATA
PSParentPath  : Microsoft.PowerShell.Core\FileSystem::C:\...
PSChildName   : ntfsm.exe::$DATA
PSDrive       : C
PSProvider    : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName      : C:\...\ntfsm.exe
Stream        : :$DATA
Length        : 20151296

PSPath        : Microsoft.PowerShell.Core\FileSystem::C:\...\ntfsm.exe:Zone.Identifier
PSParentPath  : Microsoft.PowerShell.Core\FileSystem::C:\...
PSChildName   : ntfsm.exe:Zone.Identifier
PSDrive       : C
PSProvider    : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName      : C:\...\ntfsm.exe
Stream        : Zone.Identifier
Length        : 62

(The :$DATA stream is the default and Zone.Identifier is a Windows artifact unrelated to the file contents.)

Since I could find nothing interesting straight away, I fired up IDA and inspected the code. The executable seemed to be a fairly standard C++ program, only with one quirk, which was rather aggressive outlining of code. I'm not sure if it was an intentional obfuscation or just a compiler artifact, but in the end, it proved to be little more than a slight annoyance.

The main function could be found at preferred virtual address (PVA) 0x14000C0B5. The function itself was too large to be decompiled or even displayed in a graph by IDA, but using a debugger, it was not difficult to get to the gist of its semantics:

  • At 0x14000C0D9, four std::strings are constructed: "state", "input", "position" and "transitions". I don't know about you, but to me, this screamed DFA (deterministic finite automaton, also known as a finite state machine).
  • Notably, these four strings are at various points passed as arguments to functions that read or write data to or from an alternate data stream of the ntfsm.exe binary:
    • The function at 0x140FF0FE0 reads a QWORD from the ADS specified as the first argument into the destination buffer, specified as the second argument.
    • The function at 0x14000AF00 reads data from the ADS specified as the first argument into the destination buffer. The number of bytes to be read is specified as a third argument.
    • Analogously, the function at 0x140FF1190 writes a QWORD into an ADS.
    • Finally, the function at 0x14000ADE0 writes data into an ADS.

The general flow of the program is as follows:

  • The password is read from the command line argument. If it is missing or if it is not 16 bytes long, an error message is displayed and the program exits.
  • Otherwise, the four aforementioned ADSs are created and initialized to state = 0, input = <password>, position = 0 and transitions = 0.
  • Next, the state is used as an index into a jump table and the control is transferred to the appropriate handler. In each of these handlers, the byte at position in the input is read and checked against some fixed values to determine the next state, this typical finite automaton behaviour further supports the DFA hypothesis. If the input character doesn't match any of the constants, the password is rejected.
  • The state and position ADSs are then updated with the new values and the DFA continues to run, albeit in a very surprising manner: Normally, we would expect to see an unconditional jump to the code switch that checks the state; However, this program instead creates a new process of itself with inherited file handles and waits for the child's execution to finish. This effectively means that each process only performs one transition of the DFA, sharing the current state via the associated ADSs.

While this may seem hard to dynamically analyze, and it probably is, these hardships can be overcome by instead analyzing the transitions statically — since all of the state "handlers" share roughly the same code structure, we can simply parse the code and reconstruct the transition table. I started writing a simple Python script that eventually evolved into a multi-threaded solver.

I imported the pefile library, mainly for translating virtual addresses to offsets and vice versa, and the capstone library for x64 instructions disassembly, and defined some useful constants.

from collections import deque
import capstone
import pefile

FILENAME = "ntfsm.exe"
JTABLE_VA = 0x140C687B8
RETURN_VA = 0x140C685EE
RSP_STATE_OFFSET = 0x58D30

The next set of helper functions allows me to get the location of each state "handler" based on the state number.

def va_to_offset(pe: pefile.PE, va: int) -> int:
    return pe.get_offset_from_rva(va - pe.OPTIONAL_HEADER.ImageBase)

def offset_to_va(pe: pefile.PE, offset: int) -> int:
    return pe.OPTIONAL_HEADER.ImageBase + pe.get_rva_from_offset(offset)

def read_dword(data: bytes, pe: pefile.PE, va: int) -> int:
    offset = va_to_offset(pe, va)
    return int.from_bytes(data[offset:offset+4], "little")

def get_loc_va_from_state(data: bytes, pe: pefile.PE, state: int) -> int:
    return pe.OPTIONAL_HEADER.ImageBase + read_dword(data, pe, JTABLE_VA + state * 4)

The most important function is the one below, which disassembles and parses the code for a given state handler and returns a dictionary with transitions (from an input character to the next state). The patterns are explained in the code comments.

def parse_transitions_for_state(data: bytes, pe: pefile.PE, state: int) -> dict[int, int]:
    state_loc = get_loc_va_from_state(data, pe, state)
    loc = va_to_offset(pe, state_loc)

    # We need to find a sequence of:
    #   mov(z/s)x   eax, byte ptr [rsp + input_char]
    #   mov         [rsp + input_char_tmp], (al/eax)
    cs = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
    cs.detail = True

    input_char = None
    input_char_tmp = None

    MAX_INSTRUCTIONS = 100
    for _ in range(MAX_INSTRUCTIONS):
        insn = next(cs.disasm(data[loc:], 0, 1))
        loc += insn.size

        if insn.mnemonic in ("movzx", "movsx") and (
            insn.operands[0].type == capstone.x86.X86_OP_REG and
            insn.operands[0].reg == capstone.x86.X86_REG_EAX and
            insn.operands[1].type == capstone.x86.X86_OP_MEM and
            insn.operands[1].mem.base == capstone.x86.X86_REG_RSP and
            insn.operands[1].mem.index == 0
        ):
            input_char = insn.operands[1].mem.disp
        else:
            continue

        insn = next(cs.disasm(data[loc:], 0, 1))
        loc += insn.size

        if insn.mnemonic == "mov" and (
            insn.operands[0].type == capstone.x86.X86_OP_MEM and
            insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
            insn.operands[0].mem.index == 0 and
            insn.operands[1].type == capstone.x86.X86_OP_REG and
            insn.operands[1].reg in (capstone.x86.X86_REG_AL, capstone.x86.X86_REG_EAX)
        ):
            input_char_tmp = insn.operands[0].mem.disp
            break
        else:
            input_char = None
            continue

    if input_char is None or input_char_tmp is None:
        raise ValueError(f"Failed to find input_char rsp offsets for state {state} ({hex(state_loc)}): {input_char=}, {input_char_tmp=}")

    # Followed by some number of:
    #   cmp     byte ptr [rsp + something_else], imm8
    #   jz      loc
    # Until eventually there is an unconditional jmp.
    transition_locs = {}

    for _ in range(MAX_INSTRUCTIONS):
        insn = next(cs.disasm(data[loc:], 0, 1))
        loc += insn.size

        if insn.mnemonic == "cmp" and (
            insn.operands[0].type == capstone.x86.X86_OP_MEM and
            insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
            insn.operands[0].mem.index == 0 and
            insn.operands[0].mem.disp == input_char_tmp and
            insn.operands[1].type == capstone.x86.X86_OP_IMM
        ):
            cmp_imm = insn.operands[1].imm
            if not (0 <= cmp_imm <= 255):
                raise ValueError(f"Cmp imm {cmp_imm} in state {state} is too large")
            if cmp_imm in transition_locs:
                raise ValueError(f"Duplicate cmp imm {cmp_imm} in state {state}")

            # Next instruction should be jz/je
            insn = next(cs.disasm(data[loc:], 0, 1))
            loc += insn.size

            if insn.mnemonic == "je" and (
                insn.operands[0].type == capstone.x86.X86_OP_IMM
            ):
                loc_va = offset_to_va(pe, loc)
                jz_target = loc_va - insn.size + insn.operands[0].imm
                transition_locs[cmp_imm] = jz_target
            else:
                raise ValueError(f"Unexpected instruction after cmp: {insn.mnemonic} {insn.op_str}")

        elif insn.mnemonic == "jmp":
            break

    # Then, for each loc, we need to find the state it goes to, i.e. at the new loc:
    #   mov     [rsp + 0x00058D30], imm32
    transitions = {}
    for cmp_imm, target_va in transition_locs.items():
        loc = va_to_offset(pe, target_va)

        insn = next(cs.disasm(data[loc:], 0, 1))
        loc += insn.size

        if insn.mnemonic == "mov" and (
            insn.operands[0].type == capstone.x86.X86_OP_MEM and
            insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
            insn.operands[0].mem.index == 0 and
            insn.operands[0].mem.disp == RSP_STATE_OFFSET and
            insn.operands[1].type == capstone.x86.X86_OP_IMM
        ):
            next_state = insn.operands[1].imm
            transitions[cmp_imm] = next_state
        else:
            raise ValueError(f"Unexpected instruction at state transition target va {hex(target_va)}: {insn.mnemonic} {insn.op_str}")

    return transitions

One gotcha worth mentioning that the instruction that reads the current input character into eax can be a movzx (zero-extended move, i.e. promotion of unsigned char to unsigned int), but also a movsx (sign-extended move or a signed char to signed int promotion). This detail initially eluded me and broke the parsing, as I simply kept disassembling instructions until a movzx was found, which would "overflow" into the next handler and parse the wrong transitions. It may have been more robust to check for unconditional jumps to avoid spilling into the next unrelated code block, but catching movsxs was enough to parse everything correctly.

One thing still needs addressing before we start constructing the transition graph and searching the state space. We still don't know what the condition is for a password to be accepted as successful — the code handling a successful password was easy to identify, just a couple lines of disassembly below the start of the "main-ish" function, but there were no apparent x-refs to it, so it did not seem that one of the state handlers would jump directly to this part.

.text:000000014000C209    lea     rcx, aCorrect   ; "correct!\n"
.text:000000014000C210    call    j_printf

The condition turned out to be even simpler: The main function simply checks if the position is equal to 16 (the length of the input), and if it is, the control flow continues to the success handler. The goal is thus to simply find an input that would not go to the "fail" state before 16 characters were read and checked.

The straightforward approach is to search through the state space using some search algorithm, the simplest being breadth-first search (BFS) or depth-first search (DFS). Based on my observations, the number of transitions for a given state ranged from 0 to 3, meaning that in the worst case, a to search through all 16-byte strings would mean visiting 31643M3^{16} \sim 43 \text{M} leaves, which is a lot, but not anywhere near impossible on current hardware — a typical computer not only processes an order of a billion CPU instructions per second, but also has multiple processor cores that can be used in parallel.

After implementing the single-threaded DFS search and seeing how slowly it went through the state space, I decided to generate some number of starting states with BFS and then run parallel DFS from each of the starting states:

# === Parallelization setup ===
initial_state = 0
starting_paths = []
q = deque([initial_state])
seen = set([initial_state])
while q:
    state = q.popleft()
    state_transitions = parse_transitions_for_state(data, pe, state)
    for inp, nxt in state_transitions.items():
        if nxt in paths:
            if len(paths[state]) + 1 > len(paths[nxt]):
                paths[nxt] = paths[state] + chr(inp)
                if len(paths[nxt]) == 5:
                    starting_paths.append((nxt, paths[nxt]))
        else:
            paths[nxt] = paths[state] + chr(inp)
            if len(paths[nxt]) == 5:
                starting_paths.append((nxt, paths[nxt]))
        if nxt not in seen and len(paths[nxt]) < 5:
            seen.add(nxt)
            q.append(nxt)
print({k: v for k, v in starting_paths})

The code above would print all valid 5-byte input prefixes and the state in which consuming that prefix would land, which I then included in the source code. Now, I could run the paralellized search:

START_PATHS = {65: 'JYCDE', 64: 'JYCDM', 63: 'JYCDW', 66: 'JYCM0', 68: 'JYCqQ', 69: 'JYCqV', 67: 'JYCqX', 71: 'UP6JK', 70: 'UP6JW', 76: 'UPeK9', 77: 'UPeKd', 75: 'UPeKi', 72: 'UPeNQ', 74: 'UPeNe', 73: 'UPeNr', 78: 'UPeXv', 87: 'UZ5Od', 86: 'UZ5bC', 89: 'UZ5me', 88: 'UZ5mz', 79: 'UZReD', 80: 'UZx65', 81: 'UZx6d', 82: 'UZxNC', 83: 'UZxd5', 84: 'UZxd7', 85: 'UZxd8', 45: 'iLAZ7', 44: 'iLAZW', 43: 'iLAzg', 48: 'iLTQZ', 47: 'iLTQa', 46: 'iLTQv', 50: 'iLTbC', 51: 'iLTbL', 49: 'iLTbw', 56: 'iq7_O', 54: 'iqg0Q', 53: 'iqg0n', 55: 'iqg0u', 52: 'iqgMK', 57: 'iqyRI', 59: 'iqyRY', 58: 'iqyRl', 62: 'iqySG', 61: 'iqySW', 60: 'iqySd'}

def explore_dfs(data: bytes, pe: pefile.PE, paths: dict[int, str], state: int):
    state_transitions = parse_transitions_for_state(data, pe, state)
    for inp, nxt in state_transitions.items():
        if nxt in paths:
            if len(paths[state]) + 1 > len(paths[nxt]):
                paths[nxt] = paths[state] + chr(inp)
                print(paths[nxt])
                if len(paths[nxt]) == 16:
                    print(f"Found password: {paths[nxt]}")
                    with open("password.txt", "w") as f:
                        f.write(paths[nxt] + "\n")
                        exit(0)
        else:
            paths[nxt] = paths[state] + chr(inp)
            print(paths[nxt])
            if len(paths[nxt]) == 16:
                print(f"Found password: {paths[nxt]}")
                with open("password.txt", "w") as f:
                    f.write(paths[nxt] + "\n")
                    exit(0)
        
        explore_dfs(data, pe, paths, nxt)

def run_one(start_state: int):
    with open(FILENAME, "rb") as f:
        data = f.read()
    pe = pefile.PE(data=data)

    paths = {start_state: START_PATHS[start_state]}
    explore_dfs(data, pe, paths, start_state)

if __name__ == "__main__":
    # Parallel DFS
    import sys
    import multiprocessing

    if len(sys.argv) == 1:
        processes = []
        for start_state in START_PATHS:
            print(f"explore_dfs({start_state})")
            p = multiprocessing.Process(target=run_one, args=(start_state,))
            p.start()
            processes.append(p)

        for p in processes:
            p.join()

    else:
        start_state = int(sys.argv[1])
        print(f"Starting DFS from state {start_state} with path {START_PATHS[start_state]}")
        run_one(start_state)

I was a little skeptical about this approach, as it still seemed a little too "brute-forcy". As I went to bed, I let it run on the background, and to my delight, the next day I woke up to a finished process and a file named password.txt.

PS> .\ntfsm.exe iqg0nSeCHnOMPm2Q
correct!
Your reward: f1n1t3_st4t3_m4ch1n3s_4r3_fun@flare-on.com

Flag

f1n1t3_st4t3_m4ch1n3s_4r3_fun@flare-on.com