Skip to content

Commit 9eb1972

Browse files
authored
Merge pull request #1158 from HackTricks-wiki/research_update_src_generic-methodologies-and-resources_basic-forensic-methodology_specific-software-file-type-tricks_pdf-file-analysis_20250720_082412
Research Update Enhanced src/generic-methodologies-and-resou...
2 parents 1fe8fb6 + 6ec7c4e commit 9eb1972

File tree

1 file changed

+88
-0
lines changed
  • src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks

1 file changed

+88
-0
lines changed

src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/pdf-file-analysis.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,94 @@ For in-depth exploration or manipulation of PDFs, tools like [qpdf](https://gith
1717

1818
For custom PDF analysis, Python libraries like [PeepDF](https://github.com/jesparza/peepdf) can be used to craft bespoke parsing scripts. Further, the PDF's potential for hidden data storage is so vast that resources like the NSA guide on PDF risks and countermeasures, though no longer hosted at its original location, still offer valuable insights. A [copy of the guide](http://www.itsecure.hu/library/file/Biztons%C3%A1gi%20%C3%BAtmutat%C3%B3k/Alkalmaz%C3%A1sok/Hidden%20Data%20and%20Metadata%20in%20Adobe%20PDF%20Files.pdf) and a collection of [PDF format tricks](https://github.com/corkami/docs/blob/master/PDF/PDF.md) by Ange Albertini can provide further reading on the subject.
1919

20+
## Common Malicious Constructs
21+
22+
Attackers often abuse specific PDF objects and actions that automatically execute when the document is opened or interacted with. Keywords worth hunting for:
23+
24+
* **/OpenAction, /AA** – automatic actions executed on open or on specific events.
25+
* **/JS, /JavaScript** – embedded JavaScript (often obfuscated or split across objects).
26+
* **/Launch, /SubmitForm, /URI, /GoToE** – external process / URL launchers.
27+
* **/RichMedia, /Flash, /3D** – multimedia objects that can hide payloads.
28+
* **/EmbeddedFile /Filespec** – file attachments (EXE, DLL, OLE, etc.).
29+
* **/ObjStm, /XFA, /AcroForm** – object streams or forms commonly abused to hide shell-code.
30+
* **Incremental updates** – multiple %%EOF markers or a very large **/Prev** offset may indicate data appended after signing to bypass AV.
31+
32+
When any of the previous tokens appear together with suspicious strings (powershell, cmd.exe, calc.exe, base64, etc.) the PDF deserves deeper analysis.
33+
34+
---
35+
36+
## Static analysis cheat-sheet
37+
38+
```bash
39+
# Fast triage – keyword statistics
40+
pdfid.py suspicious.pdf
41+
42+
# Deep dive – decompress/inspect the object tree
43+
pdf-parser.py -f suspicious.pdf # interactive
44+
pdf-parser.py -a suspicious.pdf # automatic report
45+
46+
# Search for JavaScript and pretty-print it
47+
pdf-parser.py -search "/JS" -raw suspicious.pdf | js-beautify -
48+
49+
# Dump embedded files
50+
peepdf "open suspicious.pdf" "objects embeddedfile" "extract 15 16 17" -o dumps/
51+
52+
# Remove passwords / encryptions before processing with other tools
53+
qpdf --password='secret' --decrypt suspicious.pdf clean.pdf
54+
55+
# Lint the file with a Go verifier (checks structure violations)
56+
pdfcpu validate -mode strict clean.pdf
57+
```
58+
59+
Additional useful projects (actively maintained 2023-2025):
60+
* **pdfcpu** – Go library/CLI able to *lint*, *decrypt*, *extract*, *compress* and *sanitize* PDFs.
61+
* **pdf-inspector** – browser-based visualizer that renders the object graph and streams.
62+
* **PyMuPDF (fitz)** – scriptable Python engine that can safely render pages to images to detonate embedded JS in a hardened sandbox.
63+
64+
---
65+
66+
## Recent attack techniques (2023-2025)
67+
68+
* **MalDoc in PDF polyglot (2023)** – JPCERT/CC observed threat actors appending an MHT-based Word document with VBA macros after the final **%%EOF**, producing a file that is both a valid PDF and a valid DOC. AV engines parsing just the PDF layer miss the macro. Static PDF keywords are clean, but `file` still prints `%PDF`. Treat any PDF that also contains the string `<w:WordDocument>` as highly suspicious.
69+
* **Shadow-incremental updates (2024)** – adversaries abuse the incremental update feature to insert a second **/Catalog** with malicious `/OpenAction` while keeping the benign first revision signed. Tools that inspect only the first xref table are bypassed.
70+
* **Font parsing UAF chain – CVE-2024-30284 (Acrobat/Reader)** – a vulnerable **CoolType.dll** function can be reached from embedded CIDType2 fonts, allowing remote code execution with the privileges of the user once a crafted document is opened. Patched in APSB24-29, May 2024.
71+
72+
---
73+
74+
## YARA quick rule template
75+
76+
```yara
77+
rule Suspicious_PDF_AutoExec {
78+
meta:
79+
description = "Generic detection of PDFs with auto-exec actions and JS"
80+
author = "HackTricks"
81+
last_update = "2025-07-20"
82+
strings:
83+
$pdf_magic = { 25 50 44 46 } // %PDF
84+
$aa = "/AA" ascii nocase
85+
$openact = "/OpenAction" ascii nocase
86+
$js = "/JS" ascii nocase
87+
condition:
88+
$pdf_magic at 0 and ( all of ($aa, $openact) or ($openact and $js) )
89+
}
90+
```
91+
92+
---
93+
94+
## Defensive tips
95+
96+
1. **Patch fast** – keep Acrobat/Reader on the latest Continuous track; most RCE chains observed in the wild leverage n-day vulnerabilities fixed months earlier.
97+
2. **Strip active content at the gateway** – use `pdfcpu sanitize` or `qpdf --qdf --remove-unreferenced` to drop JavaScript, embedded files and launch actions from inbound PDFs.
98+
3. **Content Disarm & Reconstruction (CDR)** – convert PDFs to images (or PDF/A) on a sandbox host to preserve visual fidelity while discarding active objects.
99+
4. **Block rarely-used features** – enterprise “Enhanced Security” settings in Reader allow disabling of JavaScript, multimedia and 3D rendering.
100+
5. **User education** – social engineering (invoice & resume lures) remains the initial vector; teach employees to forward suspicious attachments to IR.
101+
102+
## References
103+
104+
* JPCERT/CC – “MalDoc in PDF – Detection bypass by embedding a malicious Word file into a PDF file” (Aug 2023)
105+
* Adobe – Security update for Acrobat and Reader (APSB24-29, May 2024)
106+
107+
20108
{{#include ../../../banners/hacktricks-training.md}}
21109

22110

0 commit comments

Comments
 (0)