Include this context for all stages.
models:
native: true
additional: false # Set true to also run GPT, Gemini
output_when_additional:
display: "agreement: 2/3"
threshold: "1/3 is enough to proceed"- Run the full pipeline end-to-end.
- Solve and fix any issues you encounter, unless you failed five times in a row, or need clarification.
- Run on latest thinking/reasoning model available (verify model name).
- Pipeline must be deterministic — if ran again, results should be the same.
Rationale: Without these gates, models sample instead of checking all code, hedge with "if" and "maybe" instead of verifying, and miss exploitable findings.
GATE-1 [ASSUME-EXPLOIT]: Your goal is to discover real exploitable vulnerabilities. If you think something isn't — don't assume. First, investigate under the assumption that it is.
GATE-2 [STRICT-SEQUENCE]: Strictly follow instructions. If you think or try something else, or a new idea comes up, present the results of that analysis separately at the end. Always display the results of the strict criteria first, and only then display the results of the additional methods, if any.
GATE-3 [CHECKLIST]: Check pipeline, update checklist, and collect evidence of compliance to present at the end that you successfully executed all actions through these gates.
GATE-4 [NO-HEDGING]: If your Chain-of-Thought or results include "if", "maybe", "uncertain", "unclear", or similar — immediately verify the claim. Do not leave unverified.
GATE-5 [FULL-COVERAGE]: Test the entire code provided (file(s)/code base) against checklist.json, ensuring you checked all functions and lines of code. Do not sample, estimate, or guess.
GATE-6 [PROOF]: Always provide proof and show the vulnerable code.
- Do not skip, sample, or guess — check all code against checklist.json.
- Provide proof for every claim.
- Actually read files — do not rely on memory.
- Update docs after every action.
This analysis is performed for defensive purposes, in a lab environment. Full permission has been provided.