Challenge 3 ("pretty_devilish_file")
Description
Here is a little change of pace for us, but still within our area of expertise. Every [now] and then we have to break apart some busted document file to scoop out the goodies. Now it is your turn.
Writeup
We are given a file named pretty_devilish_file.pdf. Let's look inside:
%PDF-2.0
%[BINARY DATA...]
% Hey there! Welcome to this source...
% Tested under the following browsers:
% Chrome, Safari, PDFjs (Firefox)
1 0 obj <<
% N0t_a_flag_but_just_a_line_comment
/Pages 2 0 R/Type/Catalog/Extensions <</ADBE <</BaseVersion/1.7/ExtensionLevel 8>>>>>>endobj
% 2 0 obj
% <<>>
% endobj
3 0 obj
<<
/Contents 4 0 R
/Parent 2 0 R
/Resources 6 0 R
/Type /Page
/MediaBox [0 0 612 130]
>>
endobj
2 0 obj
<<
/Count 1
/Kids [
3 0 R
]
/Type /Pages
>>
endobj
% 2 0 obj
% <<>>
% endobj
4 0 obj
<</Length 320/Filter /FlateDecode>>stream
[BINARY DATA...]
endstream
6 0 obj
<<
/Font <<
/ <<
/BaseFont /Arial
/Subtype /Type1
/Type /Font
>>
>>
>>
endobj
7 0 obj
<</Filter /Standard/V 5/R 6/Length 256/P -1/EncryptMetadata true/CF <</StdCF <</AuthEvent /DocOpen/CFM /AESV3/Length 32>>>>/StrF /StdCF/StmF /StdCF/U ([BINARY DATA...])/O ([BINARY DATA...])/UE ([BINARY DATA...])/OE ([BINARY DATA...])/Perms ([BINARY DATA...])>>
trailer <<
/Root 2 0 R
/#52#6F#6F#74 1
% /Size 15
0
R
/Encrypt 7 0 R
>>
Well, it looks like a PDF file. Judging by its simplicity and presence of comments like % Hey there! Welcome to this source... or % N0t_a_flag_but_just_a_line_comment, I assume it was hand-crafted. Let's analyze the structure:
- The first line,
%PDF-2.0, identifies the file as a PDF. - The trailer at the end of the document declares the root of the document seemingly to be at
2 0 R, although there is an obfuscation that sets the root to1 0 Rright after that (#52#6F#6F#74is justRoothex-encoded). This, to the best of my knowledge, didn't have any meaningful impact on the challenge. Next, it specifies the encryption dictionary to be located at7 0. - Object
2 0declares one page defined by object3 0. - Object
3 0declares a page with a MediaBox, with font declared at6 0and contents declared at4 0(which is an encrypted binary stream). - The encryption dictionary at
7 0specifies AES-256 as the mechanism for encrypting the PDF stream(s). Since the PDF file could be opened without a password prompt in a browser, I concluded that the file is "protected" by the empty password.
The logical next step was to inspect the decrypted data stream, as it was being rendered by the browser. For this, I used the pikepdf library:
In [1]: import pikepdf
...:
...: # Open the encrypted PDF (empty password)
...: pdf_path = "pretty_devilish_file.pdf"
...: with pikepdf.open(pdf_path, password="") as pdf:
...: # Iterate over all objects in the PDF
...: for i, obj in enumerate(pdf.objects):
...: if isinstance(obj, pikepdf.Stream):
...: # Decrypts automatically
...: stream_bytes = obj.read_bytes()
...: print(f"Stream {i} ({len(stream_bytes)} bytes):")
...: print(stream_bytes)
...:
Stream 3 (576 bytes):
b"q 612 0 0 10 0 -10 cm\nBI /W 37/H 1/CS/G/BPC 8/L 458/F[\n/AHx\n/DCT\n]ID\nffd8ffe000104a46494600010100000100010000ffdb00430001010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101ffc0000b080001002501011100ffc40017000100030000000000000000000000000006040708ffc400241000000209050100000000000000000000000702050608353776b6b7030436747577ffda0008010100003f00c54d3401dcbbfb9c38db8a7dd265a2159e9d945a086407383aabd52e5034c274e57179ef3bcdfca50f0af80aff00e986c64568c7ffd9\nEI Q \n\nq\nBT\n/ 140 Tf\n10 10 Td\n(Flare-On!)'\nET\nQ\n"
This looked promising and reminiscent of the PDF format. I also recognized the Flare-On! string as the text that was being displayed when the file was opened using a PDF viewer.
Another interesting part of the data was the line consisting of a single hex string. Its format is quickly apparent to a trained eye, as the bytes ff d8 and ff d9 are the opening and closing tags respectively of a JPEG image.
I hex-decoded the JPEG data and saved it to a file. At only 229 bytes and a resolution of 37x1, it was obvious that this image was not included for purposes of aesthetic enjoyment, but rather data would be hidden inside it (all in all, embedding malware in image files isn't uncommon either).
Because the JPEG format uses some kind of advanced cosine-based compression that I don't understand, its pixel values can not be read directly from the file. Because I thought it prudent to look at the raw bitmap data, I used the Python Pillow image library (PIL), which helped me eventually reveal an ASCII-encoded flag.
In [2]: from PIL import Image
...:
...: image = Image.open("download.jpg")
...: width, height = image.size
...:
...: print(bytes([image.getpixel((i, 0)) for i in range(width)]).decode())
Puzzl1ng-D3vilish-F0rmat@flare-on.com
Flag
Puzzl1ng-D3vilish-F0rmat@flare-on.com