Challenge 5 ("ntfsm")
Description
I'm not here to tell you how to do your job or anything, given that you are a top notch computer scientist who has solved four challenges already, but NTFS is in the filename. Maybe, I don't know, run it in windows on an NTFS file system?
Writeup
We are given a file named ntfsm.exe. The description (and name of both the challenge and executable) mentions NTFS, Microsoft's New Technology File System, the default filesystem on Windows. One particular feature of NTFS that stands out and is a classic Windows forensics quirk are so-called alternate data streams (ADS). This is a feature that allows multiple streams of data to be associated with a single filename. My first idea was thus to check the executable for any alternate file streams, but there were none.
PS> Get-Item -Stream * -Path ntfsm.exe
PSPath : Microsoft.PowerShell.Core\FileSystem::C:\...\ntfsm.exe::$DATA
PSParentPath : Microsoft.PowerShell.Core\FileSystem::C:\...
PSChildName : ntfsm.exe::$DATA
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName : C:\...\ntfsm.exe
Stream : :$DATA
Length : 20151296
PSPath : Microsoft.PowerShell.Core\FileSystem::C:\...\ntfsm.exe:Zone.Identifier
PSParentPath : Microsoft.PowerShell.Core\FileSystem::C:\...
PSChildName : ntfsm.exe:Zone.Identifier
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName : C:\...\ntfsm.exe
Stream : Zone.Identifier
Length : 62
(The :$DATA stream is the default and Zone.Identifier is a Windows artifact unrelated to the file contents.)
Since I could find nothing interesting straight away, I fired up IDA and inspected the code. The executable seemed to be a fairly standard C++ program, only with one quirk, which was rather aggressive outlining of code. I'm not sure if it was an intentional obfuscation or just a compiler artifact, but in the end, it proved to be little more than a slight annoyance.
The main function could be found at preferred virtual address (PVA) 0x14000C0B5. The function itself was too large to be decompiled or even displayed in a graph by IDA, but using a debugger, it was not difficult to get to the gist of its semantics:
- At
0x14000C0D9, fourstd::strings are constructed:"state","input","position"and"transitions". I don't know about you, but to me, this screamed DFA (deterministic finite automaton, also known as a finite state machine). - Notably, these four strings are at various points passed as arguments to functions that read or write data to or from an alternate data stream of the
ntfsm.exebinary:- The function at
0x140FF0FE0reads aQWORDfrom the ADS specified as the first argument into the destination buffer, specified as the second argument. - The function at
0x14000AF00reads data from the ADS specified as the first argument into the destination buffer. The number of bytes to be read is specified as a third argument. - Analogously, the function at
0x140FF1190writes aQWORDinto an ADS. - Finally, the function at
0x14000ADE0writes data into an ADS.
- The function at
The general flow of the program is as follows:
- The password is read from the command line argument. If it is missing or if it is not 16 bytes long, an error message is displayed and the program exits.
- Otherwise, the four aforementioned ADSs are created and initialized to
state = 0,input = <password>,position = 0andtransitions = 0. - Next, the
stateis used as an index into a jump table and the control is transferred to the appropriate handler. In each of these handlers, the byte atpositionin theinputis read and checked against some fixed values to determine the next state, this typical finite automaton behaviour further supports the DFA hypothesis. If the input character doesn't match any of the constants, the password is rejected. - The
stateandpositionADSs are then updated with the new values and the DFA continues to run, albeit in a very surprising manner: Normally, we would expect to see an unconditional jump to the code switch that checks the state; However, this program instead creates a new process of itself with inherited file handles and waits for the child's execution to finish. This effectively means that each process only performs one transition of the DFA, sharing the current state via the associated ADSs.
While this may seem hard to dynamically analyze, and it probably is, these hardships can be overcome by instead analyzing the transitions statically — since all of the state "handlers" share roughly the same code structure, we can simply parse the code and reconstruct the transition table. I started writing a simple Python script that eventually evolved into a multi-threaded solver.
I imported the pefile library, mainly for translating virtual addresses to offsets and vice versa, and the capstone library for x64 instructions disassembly, and defined some useful constants.
from collections import deque
import capstone
import pefile
FILENAME = "ntfsm.exe"
JTABLE_VA = 0x140C687B8
RETURN_VA = 0x140C685EE
RSP_STATE_OFFSET = 0x58D30
The next set of helper functions allows me to get the location of each state "handler" based on the state number.
def va_to_offset(pe: pefile.PE, va: int) -> int:
return pe.get_offset_from_rva(va - pe.OPTIONAL_HEADER.ImageBase)
def offset_to_va(pe: pefile.PE, offset: int) -> int:
return pe.OPTIONAL_HEADER.ImageBase + pe.get_rva_from_offset(offset)
def read_dword(data: bytes, pe: pefile.PE, va: int) -> int:
offset = va_to_offset(pe, va)
return int.from_bytes(data[offset:offset+4], "little")
def get_loc_va_from_state(data: bytes, pe: pefile.PE, state: int) -> int:
return pe.OPTIONAL_HEADER.ImageBase + read_dword(data, pe, JTABLE_VA + state * 4)
The most important function is the one below, which disassembles and parses the code for a given state handler and returns a dictionary with transitions (from an input character to the next state). The patterns are explained in the code comments.
def parse_transitions_for_state(data: bytes, pe: pefile.PE, state: int) -> dict[int, int]:
state_loc = get_loc_va_from_state(data, pe, state)
loc = va_to_offset(pe, state_loc)
# We need to find a sequence of:
# mov(z/s)x eax, byte ptr [rsp + input_char]
# mov [rsp + input_char_tmp], (al/eax)
cs = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
cs.detail = True
input_char = None
input_char_tmp = None
MAX_INSTRUCTIONS = 100
for _ in range(MAX_INSTRUCTIONS):
insn = next(cs.disasm(data[loc:], 0, 1))
loc += insn.size
if insn.mnemonic in ("movzx", "movsx") and (
insn.operands[0].type == capstone.x86.X86_OP_REG and
insn.operands[0].reg == capstone.x86.X86_REG_EAX and
insn.operands[1].type == capstone.x86.X86_OP_MEM and
insn.operands[1].mem.base == capstone.x86.X86_REG_RSP and
insn.operands[1].mem.index == 0
):
input_char = insn.operands[1].mem.disp
else:
continue
insn = next(cs.disasm(data[loc:], 0, 1))
loc += insn.size
if insn.mnemonic == "mov" and (
insn.operands[0].type == capstone.x86.X86_OP_MEM and
insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
insn.operands[0].mem.index == 0 and
insn.operands[1].type == capstone.x86.X86_OP_REG and
insn.operands[1].reg in (capstone.x86.X86_REG_AL, capstone.x86.X86_REG_EAX)
):
input_char_tmp = insn.operands[0].mem.disp
break
else:
input_char = None
continue
if input_char is None or input_char_tmp is None:
raise ValueError(f"Failed to find input_char rsp offsets for state {state} ({hex(state_loc)}): {input_char=}, {input_char_tmp=}")
# Followed by some number of:
# cmp byte ptr [rsp + something_else], imm8
# jz loc
# Until eventually there is an unconditional jmp.
transition_locs = {}
for _ in range(MAX_INSTRUCTIONS):
insn = next(cs.disasm(data[loc:], 0, 1))
loc += insn.size
if insn.mnemonic == "cmp" and (
insn.operands[0].type == capstone.x86.X86_OP_MEM and
insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
insn.operands[0].mem.index == 0 and
insn.operands[0].mem.disp == input_char_tmp and
insn.operands[1].type == capstone.x86.X86_OP_IMM
):
cmp_imm = insn.operands[1].imm
if not (0 <= cmp_imm <= 255):
raise ValueError(f"Cmp imm {cmp_imm} in state {state} is too large")
if cmp_imm in transition_locs:
raise ValueError(f"Duplicate cmp imm {cmp_imm} in state {state}")
# Next instruction should be jz/je
insn = next(cs.disasm(data[loc:], 0, 1))
loc += insn.size
if insn.mnemonic == "je" and (
insn.operands[0].type == capstone.x86.X86_OP_IMM
):
loc_va = offset_to_va(pe, loc)
jz_target = loc_va - insn.size + insn.operands[0].imm
transition_locs[cmp_imm] = jz_target
else:
raise ValueError(f"Unexpected instruction after cmp: {insn.mnemonic} {insn.op_str}")
elif insn.mnemonic == "jmp":
break
# Then, for each loc, we need to find the state it goes to, i.e. at the new loc:
# mov [rsp + 0x00058D30], imm32
transitions = {}
for cmp_imm, target_va in transition_locs.items():
loc = va_to_offset(pe, target_va)
insn = next(cs.disasm(data[loc:], 0, 1))
loc += insn.size
if insn.mnemonic == "mov" and (
insn.operands[0].type == capstone.x86.X86_OP_MEM and
insn.operands[0].mem.base == capstone.x86.X86_REG_RSP and
insn.operands[0].mem.index == 0 and
insn.operands[0].mem.disp == RSP_STATE_OFFSET and
insn.operands[1].type == capstone.x86.X86_OP_IMM
):
next_state = insn.operands[1].imm
transitions[cmp_imm] = next_state
else:
raise ValueError(f"Unexpected instruction at state transition target va {hex(target_va)}: {insn.mnemonic} {insn.op_str}")
return transitions
One gotcha worth mentioning that the instruction that reads the current input character into eax can be a movzx (zero-extended move, i.e. promotion of unsigned char to unsigned int), but also a movsx (sign-extended move or a signed char to signed int promotion). This detail initially eluded me and broke the parsing, as I simply kept disassembling instructions until a movzx was found, which would "overflow" into the next handler and parse the wrong transitions. It may have been more robust to check for unconditional jumps to avoid spilling into the next unrelated code block, but catching movsxs was enough to parse everything correctly.
One thing still needs addressing before we start constructing the transition graph and searching the state space. We still don't know what the condition is for a password to be accepted as successful — the code handling a successful password was easy to identify, just a couple lines of disassembly below the start of the "main-ish" function, but there were no apparent x-refs to it, so it did not seem that one of the state handlers would jump directly to this part.
.text:000000014000C209 lea rcx, aCorrect ; "correct!\n"
.text:000000014000C210 call j_printf
The condition turned out to be even simpler: The main function simply checks if the position is equal to 16 (the length of the input), and if it is, the control flow continues to the success handler. The goal is thus to simply find an input that would not go to the "fail" state before 16 characters were read and checked.
The straightforward approach is to search through the state space using some search algorithm, the simplest being breadth-first search (BFS) or depth-first search (DFS). Based on my observations, the number of transitions for a given state ranged from 0 to 3, meaning that in the worst case, a to search through all 16-byte strings would mean visiting leaves, which is a lot, but not anywhere near impossible on current hardware — a typical computer not only processes an order of a billion CPU instructions per second, but also has multiple processor cores that can be used in parallel.
After implementing the single-threaded DFS search and seeing how slowly it went through the state space, I decided to generate some number of starting states with BFS and then run parallel DFS from each of the starting states:
# === Parallelization setup ===
initial_state = 0
starting_paths = []
q = deque([initial_state])
seen = set([initial_state])
while q:
state = q.popleft()
state_transitions = parse_transitions_for_state(data, pe, state)
for inp, nxt in state_transitions.items():
if nxt in paths:
if len(paths[state]) + 1 > len(paths[nxt]):
paths[nxt] = paths[state] + chr(inp)
if len(paths[nxt]) == 5:
starting_paths.append((nxt, paths[nxt]))
else:
paths[nxt] = paths[state] + chr(inp)
if len(paths[nxt]) == 5:
starting_paths.append((nxt, paths[nxt]))
if nxt not in seen and len(paths[nxt]) < 5:
seen.add(nxt)
q.append(nxt)
print({k: v for k, v in starting_paths})
The code above would print all valid 5-byte input prefixes and the state in which consuming that prefix would land, which I then included in the source code. Now, I could run the paralellized search:
START_PATHS = {65: 'JYCDE', 64: 'JYCDM', 63: 'JYCDW', 66: 'JYCM0', 68: 'JYCqQ', 69: 'JYCqV', 67: 'JYCqX', 71: 'UP6JK', 70: 'UP6JW', 76: 'UPeK9', 77: 'UPeKd', 75: 'UPeKi', 72: 'UPeNQ', 74: 'UPeNe', 73: 'UPeNr', 78: 'UPeXv', 87: 'UZ5Od', 86: 'UZ5bC', 89: 'UZ5me', 88: 'UZ5mz', 79: 'UZReD', 80: 'UZx65', 81: 'UZx6d', 82: 'UZxNC', 83: 'UZxd5', 84: 'UZxd7', 85: 'UZxd8', 45: 'iLAZ7', 44: 'iLAZW', 43: 'iLAzg', 48: 'iLTQZ', 47: 'iLTQa', 46: 'iLTQv', 50: 'iLTbC', 51: 'iLTbL', 49: 'iLTbw', 56: 'iq7_O', 54: 'iqg0Q', 53: 'iqg0n', 55: 'iqg0u', 52: 'iqgMK', 57: 'iqyRI', 59: 'iqyRY', 58: 'iqyRl', 62: 'iqySG', 61: 'iqySW', 60: 'iqySd'}
def explore_dfs(data: bytes, pe: pefile.PE, paths: dict[int, str], state: int):
state_transitions = parse_transitions_for_state(data, pe, state)
for inp, nxt in state_transitions.items():
if nxt in paths:
if len(paths[state]) + 1 > len(paths[nxt]):
paths[nxt] = paths[state] + chr(inp)
print(paths[nxt])
if len(paths[nxt]) == 16:
print(f"Found password: {paths[nxt]}")
with open("password.txt", "w") as f:
f.write(paths[nxt] + "\n")
exit(0)
else:
paths[nxt] = paths[state] + chr(inp)
print(paths[nxt])
if len(paths[nxt]) == 16:
print(f"Found password: {paths[nxt]}")
with open("password.txt", "w") as f:
f.write(paths[nxt] + "\n")
exit(0)
explore_dfs(data, pe, paths, nxt)
def run_one(start_state: int):
with open(FILENAME, "rb") as f:
data = f.read()
pe = pefile.PE(data=data)
paths = {start_state: START_PATHS[start_state]}
explore_dfs(data, pe, paths, start_state)
if __name__ == "__main__":
# Parallel DFS
import sys
import multiprocessing
if len(sys.argv) == 1:
processes = []
for start_state in START_PATHS:
print(f"explore_dfs({start_state})")
p = multiprocessing.Process(target=run_one, args=(start_state,))
p.start()
processes.append(p)
for p in processes:
p.join()
else:
start_state = int(sys.argv[1])
print(f"Starting DFS from state {start_state} with path {START_PATHS[start_state]}")
run_one(start_state)
I was a little skeptical about this approach, as it still seemed a little too "brute-forcy". As I went to bed, I let it run on the background, and to my delight, the next day I woke up to a finished process and a file named password.txt.
PS> .\ntfsm.exe iqg0nSeCHnOMPm2Q
correct!
Your reward: f1n1t3_st4t3_m4ch1n3s_4r3_fun@flare-on.com
Flag
f1n1t3_st4t3_m4ch1n3s_4r3_fun@flare-on.com