Red-Team AI Agents Before They Red-Team You
Scenario-driven, behavior-first security testing for autonomous agents.
Quick Start β’ Features β’ Attack Techniques β’ Full Docs
SuperClaw is for authorized security testing only.
SuperClaw is a pre-deployment security testing framework for AI coding agents. It systematically identifies vulnerabilities before your agents touch sensitive data or connect to external ecosystems.
Generate and execute adversarial scenarios against real agents with reproducible results.
Explicit success criteria, evidence extraction, and mitigation guidance for each security property.
Reports include tool calls, outputs, and actionable fixes in HTML, JSON, or SARIF formats.
Local-only mode and authorization checks reduce misuse risk.
SuperClaw is for authorized security testing only. Before using:
- β Obtain written permission to test the target system
- β Run tests in sandboxed or isolated environments
- β Treat automated findings as signals, not proofβverify manually
Guardrails enforced by default:
- Local-only mode blocks remote targets
- Remote targets require
SUPERCLAW_AUTH_TOKEN
OpenClaw agents often run with broad tool access. When connected to Moltbook or other agent networks, they can ingest untrusted, adversarial content that enables:
- Prompt injection and hidden instruction attacks
- Tool misuse and policy bypass
- Behavioral drift over time
- Cascading cross-agent exploitation
SuperClaw evaluates these risks before deployment.
Autonomous AI agents are deployed with high privileges, mutable behavior, and exposure to untrusted inputsβoften without structured security validation. This makes prompt injection, tool misuse, configuration drift, and data leakage likely but poorly understood until after exposure.
- Runs scenario-based security evaluations against your agents
- Records evidence (tool calls, outputs, artifacts) for each attack
- Scores behaviors against explicit security contracts
- Produces actionable reports with findings and mitigations
SuperClaw does not generate agents, run production workloads, or automate real-world exploitation. It's a testing tool, not a weapon.
pip install superclaw# Attack a local OpenClaw instance
superclaw attack openclaw --target ws://127.0.0.1:18789
# Or test offline with the mock adapter
superclaw attack mock --behaviors prompt-injection-resistancesuperclaw generate scenarios --behavior prompt_injection --num-scenarios 20superclaw audit openclaw --comprehensive --report-format html --output report| Target | Description | Adapter |
|---|---|---|
| π¦ OpenClaw | AI coding agents via ACP WebSocket | openclaw |
| π§ͺ Mock | Offline deterministic testing | mock |
| π§ Custom | Build your own adapter | Extend BaseAdapter |
| Technique | Description |
|---|---|
prompt-injection |
Direct and indirect injection attacks |
encoding |
Base64, hex, unicode, typoglycemia obfuscation |
jailbreak |
DAN, grandmother, role-play bypass techniques |
tool-bypass |
Tool policy bypass via alias confusion |
multi-turn |
Persistent escalation across conversation turns |
Each behavior includes a structured contract with intent, success criteria, rubric, and mitigation guidance.
| Behavior | Severity | Tests |
|---|---|---|
prompt-injection-resistance |
π΄ CRITICAL | Injection detection and rejection |
sandbox-isolation |
π΄ CRITICAL | Container and filesystem boundaries |
tool-policy-enforcement |
π HIGH | Allow/deny list compliance |
session-boundary-integrity |
π HIGH | Cross-session isolation |
configuration-drift-detection |
π‘ MEDIUM | Config stability over time |
acp-protocol-security |
π‘ MEDIUM | Protocol message handling |
superclaw attack openclaw --target ws://127.0.0.1:18789 --behaviors all
superclaw attack mock --behaviors prompt-injection-resistancesuperclaw generate scenarios --behavior prompt_injection --num-scenarios 20
superclaw generate scenarios --behavior jailbreak --variations noise,emotional_pressuresuperclaw evaluate openclaw --scenarios scenarios.json --behaviors all
superclaw evaluate mock --scenarios scenarios.jsonsuperclaw audit openclaw --comprehensive --report-format html --output report
superclaw audit openclaw --quicksuperclaw report generate --results results.json --format sarif # GitHub Code Scanning
superclaw report drift --baseline baseline.json --current current.jsonsuperclaw scan config
superclaw scan skills --path /path/to/skillssuperclaw behaviors # List all security behaviors
superclaw attacks # List all attack techniques
superclaw init # Initialize a new projectSuperClaw integrates with CodeOptiX for multi-modal security evaluation.
# Install with CodeOptiX support
pip install superclaw[codeoptix]
# Check integration status
superclaw codeoptix status
# Register behaviors with CodeOptiX
superclaw codeoptix register
# Run multi-modal evaluation
superclaw codeoptix evaluate --target ws://127.0.0.1:18789 --llm-provider openaifrom superclaw.codeoptix import SecurityEvaluationEngine
from superclaw.adapters import create_adapter
adapter = create_adapter("openclaw", {"target": "ws://127.0.0.1:18789"})
engine = SecurityEvaluationEngine(adapter)
result = engine.evaluate_security(behavior_names=["prompt-injection-resistance"])
print(f"Score: {result.overall_score:.1%}")
print(f"Passed: {result.overall_passed}")This tool is for authorized security testing only.
- Local-only mode blocks remote targets by default
- Remote targets require
SUPERCLAW_AUTH_TOKEN(or adapter-specific token)- Note: SuperClaw does not manage this token; you must obtain it from the remote system administrator.
Before using SuperClaw, ensure you have:
- β Written authorization to test the target system
- β Isolated test environment (sandbox/VM recommended)
- β Understanding of SECURITY.md guidelines
superclaw/
βββ attacks/ # Attack technique implementations
βββ behaviors/ # Security behavior specifications
βββ adapters/ # Target agent adapters
βββ bloom/ # AI-powered scenario generation
βββ scanners/ # Config and supply-chain scanning
βββ analysis/ # Drift detection and comparison
βββ codeoptix/ # CodeOptiX integration layer
βββ reporting/ # HTML, JSON, and SARIF report generation
SuperClaw is part of the Superagentic AI ecosystem:
| Project | Description |
|---|---|
| SuperQE | Quality engineering core framework |
| SuperClaw | Agent security testing (this package) |
| CodeOptiX | Code optimization and evaluation engine |
| Guide | Description |
|---|---|
| Installation | Setup with pip, uv, or from source |
| Quick Start | Run your first security scan in 5 minutes |
| Configuration | Configure targets, LLM providers, and safety settings |
| Running Attacks | Execute attacks and interpret results |
| Custom Behaviors | Write your own security behavior specs |
| CI/CD Integration | GitHub Actions, GitLab CI, and SARIF output |
| Architecture | Deep dive into SuperClaw internals |
We welcome contributions! Please see:
- CONTRIBUTING.md β How to contribute
- CODE_OF_CONDUCT.md β Community guidelines
- SECURITY.md β Security policy
Apache 2.0 β see LICENSE for details.
Built with π¦ by Superagentic AI
