Skip to content

Proposal: 4 prompt structure improvements to close false-positive gaps #1

@eppser

Description

@eppser

Context

After using exploitation-validator on a real-world assessment of a production secrets management system (~50k LOC Go codebase), the pipeline produced 42 findings of which only 5 survived independent validation — an 88% false-positive rate. The pipeline's core strengths are solid: Stage C's code accuracy verification, GATE-5's systematic coverage, and GATE-4's anti-hallucination checks all work well. The false positives come from structural gaps in what the pipeline asks, not from inaccuracy in what it verifies.

Below are 4 concrete prompt structure changes, ordered by impact, with ready-to-use prompt text and examples.


1. Add Stage N — Novelty & Known-Issue Cross-Reference (Score: 10/10)

Problem

No stage in the pipeline checks whether a finding matches an existing CVE, public advisory, or changelog fix. The pipeline treats "exploitable" and "novel" as identical — they are not. In our assessment, 5 findings matched existing CVEs but were all classified as novel 0-days.

GATE-1 (ASSUME-EXPLOIT) makes this worse: it actively suppresses the LLM's instinct to recognize known issues from training data.

Fix

Add Stage N between Stage C and Stage D. Every finding must pass a novelty check before classification.

Stage 0 → A → B → C → Stage N → D
                         ↑
                    "Is it novel?"

Stage N performs 4 checks per finding:

  • N-1: CVE database search — match against component, vuln class, and project (including upstream/forks)
  • N-2: Project advisory check — search CHANGELOG, security advisories, fix commits for the affected file
  • N-3: Upstream/fork inheritance — if target is a fork, check parent CVEs and backport status
  • N-4: Variant analysis — classify as DUPLICATE, INCOMPLETE FIX, INDEPENDENT, or NO MATCH

Classification output per finding:

{
  "novelty_status": "VARIANT",
  "variant_of": "CVE-XXXX-XXXXX",
  "novelty_evidence": "Root cause shared but fix did not cover this code path"
}

Key design decision: GATE-1 must be suspended during Stage N. Accuracy of classification takes priority over discovery bias for this one stage.

Add to shared.md:

**GATE-9 [NOVELTY]:** Before classifying any finding as a 0-day,
verify it does not match an existing CVE, public advisory, or
changelog fix. Known vulnerabilities are not 0-days. Variants of
known issues must reference the original CVE. GATE-1 is suspended
during novelty checks.

2. Add Stage C-bis — Semantic Validation (Score: 9/10)

Problem

Stage C validates code accuracy ("is the code real?"). Stage D filters LLM reasoning patterns ("did the LLM hedge?"). Neither asks: "Is this actually a vulnerability?"

A finding where the code is real (passes C) and the LLM is confident (passes D) gets accepted even when it demonstrates expected, documented behavior. In the assessment, this produced ~15 false positives including:

  • Algorithm tautologies (same key producing same output = math, not a bug)
  • Specification-required behavior (unauthenticated discovery endpoints mandated by RFC 8414)
  • Features working as named (endpoint called "sign-verbatim" flagged for signing without restrictions)
  • Privilege tautologies (root can do root things)
  • Test-harness circularity (exploit leverages policies created by the setup script)

Fix

Add Stage C-bis between C and N with 5 checks:

  • CB-1: Algorithm properties — Is the behavior a mathematical property of the algorithm? (reimplemented from spec → same result = not a bug)
  • CB-2: Specification compliance — Is the behavior required by an RFC/standard?
  • CB-3: Documented design — Is this the explicitly documented purpose of the feature?
  • CB-4: Privilege tautology — Does the exploit require privileges that already imply the outcome?
  • CB-5: Precondition realism — Were preconditions created by the test harness?

If any check fires → reclassify as BY-DESIGN, severity = INFORMATIONAL.


3. Restructure Stage D — Adversarial Cross-Examination (Score: 7/10)

Problem

Stage D operates by pattern matching on LLM output (test/mock context, precondition language, hedging words). This is brittle:

  • Confident prose about non-vulnerabilities passes D-3
  • Exploits built on test-harness artifacts pass D-2 (they don't require "another vulnerability" — just a root token the harness provided)

Fix

Replace the 3 pattern-matching filters with a prosecution/defense structure:

For EACH finding:

PROSECUTION (argue it IS a vulnerability):
  1. Realistic attacker profile
  2. Production attack scenario
  3. What attacker gains beyond current access
  4. Blast radius

DEFENSE (argue it is NOT a vulnerability):
  1. Privilege tautology check
  2. Test-harness precondition check
  3. Documentation/design-intent check
  4. Known CVE match check
  5. "Would a reasonable security team classify this as a bug?"

VERDICT: Which argument cites stronger evidence? → ACCEPT or REJECT

This forces the LLM to articulate why something is a vulnerability rather than just confirming code paths exist.


4. Remove Stage A, Merge into Stage B (Score: 6/10)

Problem

Stage A's fast path (PoC succeeds → skip to C) bypasses Stage B entirely. Stage B builds attack trees, tracks hypotheses, logs PROXIMITY, documents disproven paths. When Stage A's one-shot PoC works, none of this structured analysis happens. The most damaging false positives were findings where a working PoC fast-tracked through A→C→D without ever being scrutinized.

Fix

Absorb Stage A into Stage B as [B-0] Quick Triage:

  • PoC succeeds → [B-FAST] abbreviated analysis (still builds lightweight attack tree, checks precondition realism)
  • PoC fails → [B-FULL] standard analysis (existing B-2)
  • Disproven → Done

No finding ever skips structured analysis.


Combined Pipeline

BEFORE:
  Stage 0 → A → B → C → D
  (no novelty check, no semantic validation, fast-track bypass, pattern-matching filter)

AFTER:
  Stage 0 → B* → C → C-bis → N → D*
  B* = absorbs A, no bypass
  C  = code accuracy (unchanged)
  C-bis = semantic validation (is it a vuln?)
  N  = novelty cross-reference (is it new?)
  D* = adversarial cross-examination (final ruling)
Gap closed What slipped through Fix
No novelty check Known CVEs reported as 0-days Stage N
No semantic validation By-design behaviors rated CRITICAL Stage C-bis
Pattern-matching filter Confident non-vuln prose passed D-3 Adversarial D
Fast-track bypass Working PoCs skipped analysis B* merger

Notes

  • Full prompt text for each new stage is available if helpful — kept this issue focused on the structural arguments
  • These recommendations come from applying the pipeline to a real target and independently validating every finding it produced
  • The pipeline's existing strengths (code referencing accuracy, systematic coverage, anti-hallucination) are solid and should be preserved

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions