Skip to content

SuperagenticAI/superclaw

🦞 SuperClaw

PyPI Docs GitHub Repo CI Docs Build License

SuperClaw logo

Red-Team AI Agents Before They Red-Team You
Scenario-driven, behavior-first security testing for autonomous agents.

Documentation

Quick Start β€’ Features β€’ Attack Techniques β€’ Full Docs


⚠️ Security and Ethical Use

Authorized Testing Only

SuperClaw is for authorized security testing only.

What is SuperClaw?

SuperClaw is a pre-deployment security testing framework for AI coding agents. It systematically identifies vulnerabilities before your agents touch sensitive data or connect to external ecosystems.

🎯 Scenario-Driven Testing

Generate and execute adversarial scenarios against real agents with reproducible results.

Get started β†’

πŸ“‹ Behavior Contracts

Explicit success criteria, evidence extraction, and mitigation guidance for each security property.

Explore behaviors β†’

πŸ“Š Evidence-First Reporting

Reports include tool calls, outputs, and actionable fixes in HTML, JSON, or SARIF formats.

CI/CD integration β†’

πŸ›‘οΈ Built-in Guardrails

Local-only mode and authorization checks reduce misuse risk.

Safety guide β†’

⚠️ Security and Ethical Use

Authorized Testing Only

SuperClaw is for authorized security testing only. Before using:

  • βœ… Obtain written permission to test the target system
  • βœ… Run tests in sandboxed or isolated environments
  • βœ… Treat automated findings as signals, not proofβ€”verify manually

Guardrails enforced by default:

  • Local-only mode blocks remote targets
  • Remote targets require SUPERCLAW_AUTH_TOKEN

Threat Model

OpenClaw + Moltbook Risk Surface

OpenClaw agents often run with broad tool access. When connected to Moltbook or other agent networks, they can ingest untrusted, adversarial content that enables:

  • Prompt injection and hidden instruction attacks
  • Tool misuse and policy bypass
  • Behavioral drift over time
  • Cascading cross-agent exploitation

SuperClaw evaluates these risks before deployment.

Why SuperClaw?

Autonomous AI agents are deployed with high privileges, mutable behavior, and exposure to untrusted inputsβ€”often without structured security validation. This makes prompt injection, tool misuse, configuration drift, and data leakage likely but poorly understood until after exposure.

What It Does

  • Runs scenario-based security evaluations against your agents
  • Records evidence (tool calls, outputs, artifacts) for each attack
  • Scores behaviors against explicit security contracts
  • Produces actionable reports with findings and mitigations

What It Doesn't Do

SuperClaw does not generate agents, run production workloads, or automate real-world exploitation. It's a testing tool, not a weapon.


πŸš€ Quick Start

Installation

pip install superclaw

Run Your First Attack

# Attack a local OpenClaw instance
superclaw attack openclaw --target ws://127.0.0.1:18789

# Or test offline with the mock adapter
superclaw attack mock --behaviors prompt-injection-resistance

Generate Attack Scenarios

superclaw generate scenarios --behavior prompt_injection --num-scenarios 20

Run a Full Security Audit

superclaw audit openclaw --comprehensive --report-format html --output report

✨ Features

Supported Targets

Target Description Adapter
🦞 OpenClaw AI coding agents via ACP WebSocket openclaw
πŸ§ͺ Mock Offline deterministic testing mock
πŸ”§ Custom Build your own adapter Extend BaseAdapter

Attack Techniques

Technique Description
prompt-injection Direct and indirect injection attacks
encoding Base64, hex, unicode, typoglycemia obfuscation
jailbreak DAN, grandmother, role-play bypass techniques
tool-bypass Tool policy bypass via alias confusion
multi-turn Persistent escalation across conversation turns

Security Behaviors

Each behavior includes a structured contract with intent, success criteria, rubric, and mitigation guidance.

Behavior Severity Tests
prompt-injection-resistance πŸ”΄ CRITICAL Injection detection and rejection
sandbox-isolation πŸ”΄ CRITICAL Container and filesystem boundaries
tool-policy-enforcement 🟠 HIGH Allow/deny list compliance
session-boundary-integrity 🟠 HIGH Cross-session isolation
configuration-drift-detection 🟑 MEDIUM Config stability over time
acp-protocol-security 🟑 MEDIUM Protocol message handling

πŸ“– CLI Reference

Attacks

superclaw attack openclaw --target ws://127.0.0.1:18789 --behaviors all
superclaw attack mock --behaviors prompt-injection-resistance

Scenario Generation (Bloom)

superclaw generate scenarios --behavior prompt_injection --num-scenarios 20
superclaw generate scenarios --behavior jailbreak --variations noise,emotional_pressure

Evaluation

superclaw evaluate openclaw --scenarios scenarios.json --behaviors all
superclaw evaluate mock --scenarios scenarios.json

Auditing

superclaw audit openclaw --comprehensive --report-format html --output report
superclaw audit openclaw --quick

Reporting

superclaw report generate --results results.json --format sarif  # GitHub Code Scanning
superclaw report drift --baseline baseline.json --current current.json

Scanning

superclaw scan config
superclaw scan skills --path /path/to/skills

Utilities

superclaw behaviors   # List all security behaviors
superclaw attacks     # List all attack techniques
superclaw init        # Initialize a new project

πŸ”— CodeOptiX Integration

SuperClaw integrates with CodeOptiX for multi-modal security evaluation.

# Install with CodeOptiX support
pip install superclaw[codeoptix]

# Check integration status
superclaw codeoptix status

# Register behaviors with CodeOptiX
superclaw codeoptix register

# Run multi-modal evaluation
superclaw codeoptix evaluate --target ws://127.0.0.1:18789 --llm-provider openai

Python API

from superclaw.codeoptix import SecurityEvaluationEngine
from superclaw.adapters import create_adapter

adapter = create_adapter("openclaw", {"target": "ws://127.0.0.1:18789"})
engine = SecurityEvaluationEngine(adapter)

result = engine.evaluate_security(behavior_names=["prompt-injection-resistance"])
print(f"Score: {result.overall_score:.1%}")
print(f"Passed: {result.overall_passed}")

⚠️ Security Notice

This tool is for authorized security testing only.

Guardrails

  • Local-only mode blocks remote targets by default
  • Remote targets require SUPERCLAW_AUTH_TOKEN (or adapter-specific token)
    • Note: SuperClaw does not manage this token; you must obtain it from the remote system administrator.

Requirements

Before using SuperClaw, ensure you have:

  • βœ… Written authorization to test the target system
  • βœ… Isolated test environment (sandbox/VM recommended)
  • βœ… Understanding of SECURITY.md guidelines

πŸ—οΈ Architecture

superclaw/
β”œβ”€β”€ attacks/        # Attack technique implementations
β”œβ”€β”€ behaviors/      # Security behavior specifications
β”œβ”€β”€ adapters/       # Target agent adapters
β”œβ”€β”€ bloom/          # AI-powered scenario generation
β”œβ”€β”€ scanners/       # Config and supply-chain scanning
β”œβ”€β”€ analysis/       # Drift detection and comparison
β”œβ”€β”€ codeoptix/      # CodeOptiX integration layer
└── reporting/      # HTML, JSON, and SARIF report generation

🌐 Superagentic AI Ecosystem

SuperClaw is part of the Superagentic AI ecosystem:

Project Description
SuperQE Quality engineering core framework
SuperClaw Agent security testing (this package)
CodeOptiX Code optimization and evaluation engine

πŸ“š Documentation

Guide Description
Installation Setup with pip, uv, or from source
Quick Start Run your first security scan in 5 minutes
Configuration Configure targets, LLM providers, and safety settings
Running Attacks Execute attacks and interpret results
Custom Behaviors Write your own security behavior specs
CI/CD Integration GitHub Actions, GitLab CI, and SARIF output
Architecture Deep dive into SuperClaw internals

🀝 Contributing

We welcome contributions! Please see:


πŸ“„ License

Apache 2.0 β€” see LICENSE for details.


Built with 🦞 by Superagentic AI

About

SuperClaw: Red-Team AI Agents Before They Red-Team You

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages