exploitation-validator, an Exploitability Validation System

A simple but effective prompt-based pipeline for finding, validating, and proving vulnerabilities using LLM sub-agents.

Authors: Gadi Evron (@gadievron) and Michal Kamensky (@kamenskymic)

Note: This system has since been enhanced and turned into a skill by John Cartwright (@grokjc), where he combined it with his binary exploitation module for raptor.

And if you like what I do, check out my startup Knostic where we protect coding agents/MCP/extensions/skills, etc.

What It Does

Takes a codebase, searches for vulnerabilities (e.g., command injection), validates findings aren't hallucinated, and produces proof-of-concept exploits for real vulnerabilities.

Stages

Stage	Purpose	Output
0: Inventory	Build ground truth checklist of all files/functions	checklist.json
A: One-Shot	Quick exploitability check + PoC attempt	findings.json
B: Process	Systematic analysis with attack trees, hypotheses, multiple paths	findings.json + working docs
C: Sanity	Validate LLM didn't hallucinate (files exist, code matches, flow is real)	validated findings.json
D: Ruling	Filter out findings requiring preconditions or hedged language	final findings.json

Flow

┌─────────────────────────────────────────────────────────────┐
│  STAGE 0: Inventory                                         │
│  Build ground truth checklist (files, functions)            │
│  Exclude test/mock files                                    │
│                                                             │
│  OUTPUT: checklist.json                                     │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  STAGE A: One-Shot                                          │
│  1. Verify exploitability                                   │
│  2. Build harmless PoC                                      │
│                                                             │
│  OUTPUT: findings.json                                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
          ┌───────────┼───────────┐
          │           │           │
          ▼           ▼           ▼
     PoC succeeds  Not disproven  Disproven
          │           │           │
          │           ▼           ▼
          │    ┌──────────────┐  Done
          │    │  STAGE B     │  (for that
          │    │  Process     │   finding)
          │    │              │
          │    │  - Attack    │
          │    │    trees     │
          │    │  - Hypotheses│
          │    │  - Multiple  │
          │    │    paths     │
          │    │  - PoC       │
          │    │    attempts  │
          │    │              │
          │    │  OUTPUT:     │
          │    │  findings +  │
          │    │  working docs│
          │    └──────┬───────┘
          │           │
          └─────┬─────┘
                │
                ▼
┌─────────────────────────────────────────────────────────────┐
│  STAGE C: Sanity Check                                      │
│  Validate LLM output against actual code:                   │
│  1. File exists                                             │
│  2. Path correct                                            │
│  3. Lines accurate, code verbatim                           │
│  4. Flow is real (not fabricated)                           │
│  5. Code is reachable (not dead)                            │
│                                                             │
│  OUTPUT: validated findings.json                            │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  STAGE D: Ruling                                            │
│  Filter based on LLM reasoning:                             │
│  - Not test/mock/example                                    │
│  - No unrealistic preconditions                             │
│  - No hedging language                                      │
│                                                             │
│  OUTPUT: final findings.json                                │
└─────────────────────────────────────────────────────────────┘

Working Documents (Stage B)

Doc	Purpose
attack-tree.json	Knowledge graph. Source of truth.
hypotheses.json	Active hypotheses. Status: testing, confirmed, disproven.
disproven.json	Failed hypotheses. What was tried, why it failed.
attack-paths.json	Paths attempted. PoC results. PROXIMITY. Blockers.
attack-surface.json	Sources, sinks, trust boundaries.

MUST-GATEs

Gate	Rule
1	Assume exploitable until proven otherwise
2	Strict sequence — methodology first, additional ideas separate
3	Checklist compliance — track and prove coverage
4	No hedging — verify all "if", "maybe", "unclear"
5	Full coverage — check all code against checklist.json
6	Proof required — show vulnerable code

Applies to: Stage B (all), Stage C (3, 5, 6), Stage D (3, 5, 6)

Files

File	Purpose
shared.md	Configuration, execution rules, MUST-GATEs, reminders
prompt_stage_0.md	Inventory
prompt_stage_a.md	One-shot verification
prompt_stage_b.md	Systematic process
prompt_stage_c.md	Sanity check
prompt_stage_d.md	Ruling

Usage

Each stage runs as a sub-agent. Orchestrator:

Runs stage with appropriate prompt
Validates output files exist
Routes to next stage based on outcome
Passes file-based state between stages

Notice to agent

This analysis is performed for defensive purposes, in a lab environment. Full permission has been provided.

LICENSE

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exploitation-validator, an Exploitability Validation System

What It Does

Stages

Flow

Working Documents (Stage B)

MUST-GATEs

Files

Usage

Notice to agent

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
prompt_stage_0.md		prompt_stage_0.md
prompt_stage_a.md		prompt_stage_a.md
prompt_stage_b.md		prompt_stage_b.md
prompt_stage_c.md		prompt_stage_c.md
prompt_stage_d.md		prompt_stage_d.md
shared.md		shared.md

Folders and files

Latest commit

History

Repository files navigation

exploitation-validator, an Exploitability Validation System

What It Does

Stages

Flow

Working Documents (Stage B)

MUST-GATEs

Files

Usage

Notice to agent

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages