Releases: CyberStrategyInstitute/ai-safe2-framework
2026-3-23-AISM-Assessment and Measurement Tools + Documentation Expansion
AI SAFE2 v2.1 / Cyber Strategy Institute / March 2026
AISM Release Notes: Documentation Expansion
What Changed and Why
The initial release answered what AISM is. This release answers how to use it.
The initial release launched the AISM framework: five pillars, the defense loop, the maturity ladder, and the control stack, all inside a single README. Three new tools now give organizations the instruments to actually run an assessment, score it, and map results to their regulatory obligations. The README has been restructured so someone new can navigate the full ecosystem in under a minute. And the core framework concepts have been moved into dedicated pages so each audience (engineers, compliance teams, executives) gets exactly what they need without reading everything.
The framework itself has not changed. The five pillars, maturity levels, and operational defense loop are the same. What changed is everything around them.
Core Framework Documents
| File | Purpose | Audience |
|---|---|---|
| strategic-architecture.md | Three-layer governance architecture: Sovereignty, Controls, Runtime | Architects CISOs |
| operational-loop.md | How the five pillars operate as a continuous defense cycle | Security teams Engineers |
| sovereignty-matrix.md | Human control vs. AI autonomy quadrant: where your organization sits | Risk leaders Executives |
| maturity-model.md | Five-level progression from Chaos to Sovereignty with level criteria | All stakeholders |
| control-stack.md | Technical enforcement layers from policy to infrastructure | Engineers Architects |
| agent-threat-control-matrix.md | Agentic AI threat landscape mapped to AISM controls and MITRE ATLAS | Red teams Security engineers |
Assessment and Measurement Tools
| File | Purpose | Audience |
|---|---|---|
| AISM-Self-Assessment-Tool.md | 10-topic checklist across all five pillars, producing an AISM Sovereignty Score | Security teams Compliance |
| AISM-Scoring-Matrix-Methodology.md | Quantitative scoring framework: how scores are calculated, weighted, and interpreted | Framework practitioners |
| AISM-Compliance-Crosswalk.md | Control mapping to NIST AI RMF, ISO 42001, EU AI Act, CSA AICM, NIST CSF 2.0, MITRE ATLAS, OWASP LLM | Compliance Audit Procurement |
New: Three Assessment and Measurement Tools
AISM Self-Assessment Tool
The original release described five maturity levels. This tool lets you find out which one you are actually at.
A structured checklist covering all five pillars across 10 topics, with controls organized by maturity level from Reactive through Autonomous Governance. Designed for a cross-functional team including Security/CISO, AI/ML Engineering, Legal/Compliance, IT Operations, and Leadership. Complete it and you walk out with a scored AISM Sovereignty Score on a 1 to 5 scale you can act on immediately. Includes section-level ratings across three metrics (Coverage, Robustness, Sovereignty Assurance), a full scoring summary table, and maturity classification guide.
AISM Scoring Matrix Methodology
The quantitative foundation that makes AISM scores defensible, not just descriptive.
Documents the evaluation of five existing scoring approaches: IEEE/NIST AI RMF, NIST CSF Dual-Survey, Sandia Maturity Certification, CSA AICM, and Microsoft RAI MM, and demonstrates why none scores above 2.55/5.00 on agentic AI era requirements. From that analysis, the AISM composite methodology inherits the best element from each: the three-metric rubric from IEEE/NIST, expert-weighted calibration from NIST CSF research, five-level structure from Sandia, control taxonomy from CSA AICM, and interdependency awareness from Microsoft RAI MM. Covers dimension and pillar weighting with rationale, the HHH scoring rubric, and CVSS integration for combined risk scoring.
AISM Compliance Crosswalk
One AISM assessment, audit artifacts for seven frameworks at once.
Maps every AI SAFE2 v2.1 subtopic across all five pillars, 10 topics, and all v2.1 gap-filler controls (GF1 through GF5) to NIST AI RMF 1.0, ISO/IEC 42001:2022, EU AI Act, CSA AICM, NIST CSF 2.0, MITRE ATLAS, and OWASP Top 10 for LLM simultaneously. Coverage ranges from 90% (CSA AICM, 16 of 18 domains) to 100% (NIST AI RMF, NIST CSF 2.0, ISO 42001, OWASP LLM). Built for enterprise procurement, audit readiness, and multi-framework compliance reporting.
Updated: README Restructured
AISM README
From single-document framework to navigable ecosystem entry point.
The original README carried everything: the framework overview, pillar descriptions, maturity ladder, sovereignty matrix, control stack, and defense loop in a single document. That served the launch. It does not serve an organization that needs to navigate a growing ecosystem of tools and reference materials.
The README is now a strategic entry point. It leads with why AISM exists and what makes it different from NIST AI RMF, ISO 42001, CSA AICM, MS RAI MM, and NIST CSF 2.0, using a direct capability comparison table built from the scoring methodology analysis. It separates the value proposition by audience (Security leaders, Engineering, Compliance, Executives) and provides a full ecosystem map linking every file with its purpose and intended audience.
The most important addition for first-time users is the six-step "Start Here" onboarding path: a sequenced route from orientation through completed assessment, with direct file links at each step. The framework content from the original README has been preserved and expanded in six dedicated topic pages.
New: Six Dedicated Topic Pages
The core framework concepts from the original README now live in standalone files, each writ...
Cognitive Sovereignty Framework (CSF) v2.0 Released
Companion Release: Cognitive Sovereignty Framework (CSF) — Now Live
Cyber Strategy Institute · February 2026
The Gap AI SAFE² Does Not Cover
AI SAFE² secures the AI system. It governs the tool — prompt injection defenses, agent scoping, data leakage prevention, swarm governance, runtime circuit breakers. It answers the question: Is the AI system trustworthy and correctly bounded?
It does not answer a second, equally critical question: Is the human operating the system cognitively sovereign?
An operator who has experienced sufficient attention capture, cognitive offloading, or decision automation capture can be fully compromised — regardless of how well-hardened their AI infrastructure is. The AI system is secure. The human operating it is not. This is the gap the AI SAFE² framework was designed to acknowledge but not address.
That gap now has a companion framework.
Introducing the Cognitive Sovereignty Framework
The Cognitive Sovereignty Framework (CSF) is the CSI open-source response to the human layer of the AI security problem. Where AI SAFE² protects the machine, the CSF protects the person.
→ CSF Learning Hub — Start here. What it is, why it exists, how to use it.
→ Threat Explorer — Interactive taxonomy, CTSS scoring, swarm threat phases, human outcome indicators.
→ Command Center — Full framework in a single operational dashboard.
→ Full Repository — Source files, taxonomy registry, assessment templates, examples.
How They Fit Together
| AI SAFE² — Machine Layer | CSF — Human Layer | |
|---|---|---|
| Defends | The AI system | The human operator |
| Governs | The tool | The capacity to govern the tool |
| Prevents | Prompt injection, data leakage, unsafe autonomy | Cognitive offloading, attention capture, decision automation capture, identity fragmentation |
| Ensures | AI stays in its lane | The human stays capable of defining the lane |
| Repo | https://github.com/CyberStrategyInstitute/ai-safe2-framework | https://github.com/CyberStrategyInstitute/cognitive-sovereignty |
The shared principle: Both frameworks are grounded in the same core commitment — AI is always a tool, never a moral agent. Human authority is non-negotiable.
In the CSF this is formalized as EFA (Ethical Functionality without Agency) and the E7 Protocol Stack — which places Mission and Authority permanently at Layer 7, ensuring human decision rights never leak downward into automated systems. This is the same architectural principle that AI SAFE²'s runtime governors enforce at the technical layer.
They are two implementations of the same doctrine at different layers of the stack.
The Threat That Connects Both Frameworks
The highest-scoring threat in the CSF taxonomy is T-CT-008: Memetic Swarm Orchestration (CTSS 90) — coordinated AI agent campaigns that test, evolve, and amplify narratives at non-human speed. This is the same threat class that AI SAFE²'s swarm governance pillar addresses at the infrastructure level.
AI SAFE² defends the integrity of AI systems against adversarial swarm techniques.
CSF defends human populations against the cognitive effects of swarm-delivered narratives.
Defending only one layer leaves the other entirely exposed.
→ Full swarm threat analysis — Phase A, B, and C
What to Do
If you are an AI SAFE² user:
-
Review the CSF Six-Domain Assessment alongside your existing AI SAFE² implementation. Pay particular attention to Domain 6: Digital & AI Symbiosis — this is the human-layer complement to your existing AI governance work.
-
Map your highest-scoring CTSS threats against your current AI SAFE² pillar coverage. Threats in the Substrate layer (Layer −1) — particularly ST-003 (Cognitive Offloading) and ST-006 (Guardrail Alignment Drift) — require human behavioral interventions that no technical control can substitute for.
-
Use the live CTSS Calculator to score the cognitive threat posture of your operating environment alongside your AI SAFE² risk assessments.
If you are evaluating AI SAFE²:
The CSF is the companion framework for the human side of what AI SAFE² addresses technically. A complete AI security posture requires both. Start with the CSF Learning Hub.
Citation
@misc{csf_framework,
title = {Cognitive Sovereignty Framework v2.0},
author = {Sullivan, Vincent and {Cyber Strategy Institute}},
year = {2026},
url = {https://github.com/CyberStrategyInstitute/cognitive-sovereignty}
}
Cyber Strategy Institute · https://cyberstrategyinstitute.com
AI SAFE²: https://github.com/CyberStrategyInstitute/ai-safe2-framework
CSF: https://github.com/CyberStrategyInstitute/cognitive-sovereignty
2026-3-18 AI SAFE² Framework Dashboard v2.1.0
🚀 Release Notes: AI SAFE² Framework Dashboard v2.1.0
Release Date: March 18, 2026
Release Type: Major Feature Release
Status: Production Ready
📊 Overview
We are excited to announce the launch of the AI SAFE² Framework Interactive Dashboard a dynamic, web-based taxonomy explorer that transforms how security architects, GRC officers, and AI engineers interact with the AI SAFE² security framework.
Rather than navigating static documentation, users can now explore all 128 controls across 5 strategic pillars through an intuitive, filterable, searchable interface hosted directly on GitHub Pages.
👉 Launch Dashboard 👈
✨ What's New
🎯 Interactive Taxonomy Explorer
A production-grade, single-page application that provides:
- Complete Control Catalog: Browse all 128 security controls with full metadata
- Real-Time Search: Instant filtering across control IDs, names, descriptions, sub-topics, and decision-maker impacts
- Pillar-Based Navigation: Filter by strategic domain (Sanitize & Isolate, Audit & Inventory, Fail-Safe & Recovery, Engage & Monitor, Evolve & Educate)
- Risk-Level Filtering: Quickly identify Critical, High, Medium, and Low risk controls
- Detailed Control Views: Click any control to view comprehensive implementation guidance, framework mappings, and business impact
🎨 Professional Design System
- Color-Coded Pillars: Each strategic pillar has a distinct visual identity with custom color schemes
- AI SAFE² Shield Logo: Official framework branding integrated into the header
- Responsive Layout: Optimized for desktop, tablet, and mobile viewing
- Dark Mode Interface: Professional cybersecurity aesthetic with reduced eye strain
- Grid-Based Dashboard: Clean, modern layout with backdrop blur effects and glass-morphism panels
📈 Executive-Friendly Insights
Every control includes:
- Decision-Maker Impact: Clear business justification for non-technical stakeholders
- Implementation Guidance: Practical deployment instructions for engineering teams
- Framework Mappings: Cross-references to OWASP, MITRE ATLAS, NIST AI RMF, ISO standards, and more
- Risk Assessment: Immediate visibility into control criticality
- Gap Analysis: Visual identification of gap-filler controls addressing emerging threats
🔍 Advanced Features
- v2.1 Control Highlighting: Automatically identifies and badges next-generation controls (Agent Security, Memory Security, NHI, Multi-Agent, Distributed Systems)
- Sub-Topic Categorization: Granular organization within each pillar (e.g., "Sanitize (Input Validation)", "Monitor (Detection)")
- Live Data Synchronization: Pulls controls from GitHub repository in real-time with local fallback
- Smart Statistics: Dynamic counters showing total controls, critical controls, gap fillers, and pillar coverage
- Zero Build Process: Pure HTML/CSS/JavaScript implementation no compilation, no dependencies, instant deployment
🎯 Target Audience
This dashboard is designed for:
- Security Architects: Quickly identify applicable controls for AI system design
- GRC Officers: Map AI SAFE² controls to compliance frameworks and audit requirements
- AI Engineers: Access implementation guidance and technical references
- Executive Leadership: Understand business impact and risk prioritization through decision-maker summaries
- Consultants & Auditors: Navigate the framework efficiently during assessments
- Researchers & Educators: Explore the taxonomy for academic and training purposes
📦 Technical Specifications
Architecture
- Technology Stack: Vanilla HTML5, CSS3, JavaScript (ES6+)
- Styling: Tailwind CSS (CDN-based, no build required)
- Fonts: Plus Jakarta Sans (UI), JetBrains Mono (code/IDs)
- Data Format: JSON (controls.json)
- Deployment: GitHub Pages (static hosting)
- Browser Support: All modern browsers (Chrome, Firefox, Safari, Edge)
Performance
- Load Time: < 1 second initial page load
- Data Fetch: < 500ms from GitHub CDN
- Search Performance: Instant client-side filtering (no server round-trips)
- Asset Size: ~45KB HTML, ~3KB CSS (inline), ~15KB JS (inline), ~50KB JSON data
- Total Bundle: < 120KB (uncompressed)
Data Source
The dashboard reads from two sources (in priority order):
- Primary:
https://raw.githubusercontent.com/CyberStrategyInstitute/ai-safe2-framework/main/dashboard/public/data/controls.json - Fallback: Local
./public/data/controls.json
This ensures resilience and allows offline viewing with cached data.
📊 Framework Coverage
Control Distribution
Total Controls: 128
By Pillar:
- Sanitize & Isolate: ~25 controls
- Audit & Inventory: ~27 controls
- Fail-Safe & Recovery: ~25 controls
- Engage & Monitor: ~25 controls
- Evolve & Educate: ~26 controls
By Risk Level:
- Critical: High-priority controls for immediate implementation
- High: Important controls for comprehensive security posture
- Medium: Standard security practices
- Low: Foundational and hygiene controls
Special Categories:
-
v2.1 Controls: Next-generation additions covering:
- Agent Security & Verification
- Memory Security (vector databases, embeddings)
- Non-Human Identity (NHI) management
- Multi-Agent coordination & isolation
- Distributed system monitoring
- AI supply chain security
-
Gap Filler Controls: Novel controls addressing threats unique to AI systems not covered by traditional frameworks
Framework Mappings
Controls reference 20+ industry frameworks including:
- OWASP Top 10 for LLM Applications
- MITRE ATLAS (Adversarial Threat Landscape for AI Systems)
- NIST AI Risk Management Framework
- ISO/IEC 42001 (AI Management System)
- ISO 27001/27701 (Information Security & Privacy)
- CIS Controls
- COBIT
- Safety Engineering Standards (ISO 26262)
- And more...
🚀 Getting Started
Access the Dashboard
Live URL: https://cyberstrategyinstitute.github.io/ai-safe2-framework/dashboard/
No installation required just simply click the link and start exploring.
Basic Usage
- Browse by Pillar: Click pillar filter buttons at the top to focus on a specific strategic domain
- Search Controls: Type keywords in the search bar (searches IDs, names, descriptions, sub-topics)
- View Details: Click any control card to open the detailed modal with full implementation guidance
- Check Statistics: Monitor the header counters to see control distribution and critical counts
- Filter by Risk: (Future enhancement) Select risk level filters for targeted assessment
Advanced Features
- Keyboard Shortcuts: Press
Escapeto close detail modals - Deep Linking: (Future enhancement) Share direct links to specific controls
- Export Capabilities: (Future enhancement) Export filtered control sets to CSV/PDF
🔄 Integration with Existing Documentation
The dashboard complements existing framework documentation:
- README.md: High-level framework overview and methodology
- Dashboard: Interactive exploration of all 128 controls with live filtering
- controls.json: Machine-readable control definitions for tooling integration
- Assets: Official logos, diagrams, and visual resources
Teams can now choose their preferred engagement method:
- Quick Reference: Use the dashboard for rapid control lookup
- Deep Dive: Read the full markdown documentation for methodology and context
- Automation: Parse controls.json for CI/CD pipeline integration
🛠️ For Developers
Repository Structure
ai-safe2-framework/
├── dashboard/
│ ├── index.html # Main dashboard application
│ ├── public/
│ │ └── data/
│ │ └── controls.json # Control definitions (128 controls)
│ └── README.md # Dashboard documentation
├── assets/
│ └── AI SAFE2 Shield nbg.png # Official framework logo
└── README.md # Main framework documentation
Extending the Dashboard
The dashboard is designed for easy customization:
- Add New Controls: Update
controls.json— changes appear immediately (no rebuild) - Modify Styling: Edit inline CSS variables in
index.html - Add Features: Extend JavaScript functions (search, filtering, export, etc.)
- Customize Branding: Replace logo URL and color scheme variables
Local Development
# Clone the repository
git clone https://github.com/CyberStrategyInstitute/ai-safe2-framework.git
# Navigate to dashboard
cd ai-safe2-framework/dashboard
# Open in browser (no build required)
open index.htmlContributing
We welcome contributions! To suggest improvements:
- Fork the repository
- Make your changes to
dashboard/index.htmlordashboard/public/data/controls.json - Test locally by opening
index.htmlin a browser - Submit a pull request with a clear description of changes
📝 Control Data Schema
Each control in controls.json follows this structure:
{
"id": "P1.T1.1",
"name": "Control Name",
"pillar": "Sanitize & Isolate",
"sub_topic": "Sanitize (Input Validation)",
"is_gap_filler": false,
"description": "Detailed control description",
"risk_level": "High",
"decision_maker_impact": "Business justification for executives",
"implementation_guidance": "Technical deployment instructions",
"related_frameworks": ["OWASP LLM01", "NIST AI RMF"],
"framework_note": "Optional positioning...2026-3-12 AI SAFE² × SlowMist Security Overlay
Release: AI SAFE² × SlowMist Security Overlay
Path:
examples/slowmist-overlay/
License: CC-BY-SA 4.0 (Documentation) / MIT (Code)
Frameworks: AI SAFE² v2.1 × SlowMist OpenClaw Security Practice Guide v2.7
Status: Generally Available
Overview
We are releasing the AI SAFE² × SlowMist Security Overlay — a comprehensive integration guide and asset library that bridges two of the most rigorous security frameworks available for high-privilege autonomous AI agents.
This release provides full-stack security governance for OpenClaw deployments by combining the strengths of both frameworks into a unified, layered architecture. It is designed for security engineers, platform operators, and governance teams who have already deployed (or are deploying) the SlowMist OpenClaw Security Practice Guide and want to extend it with AI SAFE²'s external enforcement layer — without discarding any existing controls.
Why We Built It
The Problem
OpenClaw is not a chatbot. It is an always-on autonomous execution engine with root-level terminal access, continuous skill installation, and the ability to manage files, call external APIs, and orchestrate complex workflows — without synchronous human approval.
Recent independent academic research tested OpenClaw across 47 adversarial scenarios derived from MITRE ATLAS and ATT&CK. The results were unambiguous: OpenClaw's baseline native defense rate against sandbox escape attacks was just 17%. Relying on an LLM's own safety training as the primary security control is not a security posture. It is a liability.
Two serious frameworks have risen to address this. Each is excellent within its defined scope. Neither is sufficient alone.
The Gap
SlowMist OpenClaw Security Practice Guide (v2.7) delivers world-class agent-facing runtime safety:
- ✅ Behavioral red/yellow line taxonomy encoded into the agent's own reasoning layer
- ✅ Rigorous supply-chain intake protocol (offline clone → full-text scan → human approval)
- ✅ In-action permission narrowing, hash baselines, and immutable audit logging
- ✅ Nightly 13-metric host audit with push notifications and explicit reporting
- ✅ Brain backup and operational disaster recovery
But as a standalone solution it has structural blind spots:
- ❌ No cross-deployment fleet visibility — it secures one agent on one host
- ❌ No real-time API-layer enforcement — detection latency up to 24 hours for hash baseline drift
- ❌ No automated circuit-breakers — only reactive human-confirmation gates
- ❌ No cross-agent anomaly detection — each box is an island
- ❌ No formalized organizational training cadence or recurring red-team schedule
AI SAFE² Framework (v2.1) delivers robust external governance:
- ✅ Control Gateway for real-time enforcement outside the agent's blast radius
- ✅ Memory Vaccine for persistent cognitive-layer contamination prevention
- ✅ Vulnerability Scanner with 0–100 risk scoring and remediation guidance
- ✅ Enterprise-wide automation inventory and cross-deployment anomaly detection
- ✅ Structured red-team exercises, organizational training cadence, and threat model lifecycle
But without SlowMist's operational specificity, it lacks:
- ❌ A concrete behavioral taxonomy ready to deploy into the agent's reasoning layer
- ❌ The 13-metric nightly host audit structure and no-silent-pass reporting philosophy
- ❌ A supply-chain intake protocol for skills and MCPs
- ❌ Agent-native disaster recovery and brain backup patterns
Together, they cover every layer of the attack surface. This overlay provides the integration layer that makes them work as a single unified architecture.
Architecture
The overlay establishes a three-layer defense hierarchy. Each layer is independently enforceable — a failure at one layer does not cascade if the others are correctly deployed.
┌─────────────────────────────────────────────────────────────────────────┐
│ ORGANIZATIONAL LAYER (AI SAFE² Pillars 2, 4, 5) │
│ │
│ Cross-deployment automation registry • Fleet-wide anomaly detection │
│ Quarterly red-team exercises • Annual threat model review │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ GATEWAY LAYER (AI SAFE² Pillar 4) │ │
│ │ │ │
│ │ AI SAFE² Control Gateway — between OpenClaw ↔ LLM API │ │
│ │ Real-time risk scoring (0–10) • Prompt injection blocking │ │
│ │ High-risk tool denial • Automated circuit-breakers │ │
│ │ Immutable API-layer audit logs │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ HOST / AGENT LAYER (SlowMist Matrix) │ │ │
│ │ │ │ │ │
│ │ │ PRE-ACTION │ │ │
│ │ │ Red/Yellow Line Rules + Skill Installation Audit │ │ │
│ │ │ └── AI SAFE² Memory Vaccine (Pillar 1) │ │ │
│ │ │ Persistent cognitive rules • Memory poisoning │ │ │
│ │ │ prevention • Prompt injection heuristics │ │ │
│ │ │ │ │ │
│ │ │ IN-ACTION │ │ │
│ │ │ Permission Narrowing • Hash Baselines • Audit Logs │ │ │
│ │ │ └── Gateway continues enforcement at this layer │ │ │
│ │ │ │ │ │
│ │ │ POST-ACTION │ │ │
│ │ │ Nightly 13-Metric Audit • Push Notification • Backup │ │ │
│ │ │ └── AI SAFE² Vulnerability Scanner (Pillar 2) │ │ │
│ │ │ Secrets • Network exposure • 0–100 risk score │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
What This Release Closes
| Gap in SlowMist Standalone | How the Overlay Closes It |
|---|---|
| 24-hour hash baseline detection latency | Control Gateway provides real-time, external enforcement at every API call |
| Reactive human-confirmation gates only | Automated circuit-breakers trip on risk score threshold breach before human review |
| No persistent memory sanitization | Memory Vaccine filters memory writes and encodes anti-poisoning directives as priority cognitive context |
| Per-box logs with no fleet view | Centralized log aggregation enables cross-deployment anomaly correlation |
| No identity architecture or credential rotation | AI SAFE² Sanitize & Isolate pillar: JIT credentials, rotation bots, short-lived OAuth tokens |
| No organizational red-team cadence | Formalized quarterly and semi-annual exercise schedule built on SlowMist's validation curriculum |
| No cross-agent impersonation testing | Semi-annual A2A impersonation exercise defined in red-team-schedule-and-resources.md |
| No annual threat model lifecycle | AI SAFE² Evolve pillar: annual review incorporating new OpenClaw releases and emerging CVEs |
Key Research
- Don't Let the Claw Grip Your Hand (2026) — 47 adversarial scenarios; 17% baseline defense rate; HITL layer raises defense to 19–92%. arxiv.org/html/2603.10387
- AI SAFE² OpenClaw Analysis — Detailed gap analysis of native vs. external enforcement. cyberstrategyinstitute.com
- OpenClaw Security Survival Guide — Operator-friendly synthesis of SlowMist + extended hardening context. penligent.ai
Contributing
Contributions that improve the pillar mapping, incorporate new SlowMist guide versions, extend the threat model with emerging attack patterns, or add deployment patterns from production are welcome.
Please open an issue before submitting a PR for substantive changes to the architecture or pillar mappings.
License
Documentation in this directory is licensed under CC-BY-SA 4.0, consistent with the AI SAFE² Framework methodology license. Code components follow MIT. See the root-level LICENSE files for full terms.
# Release: AI SAFE² × SlowMist Security Overlay"If governance is not enforced at runtime, it is not governance. It is forensics."
— Cyber Strategy Institute
**Path:...
2026-03-6-AISM_RELEASE_NOTE
🚀 AISM: The AI Sovereignty Maturity Model (v3.0)
Tagline: The Operating System for Safe Autonomous AI
Core Principle: Probabilistic intelligence requires deterministic control.
Path: /AISM/
🌍 The State of the Union
We are witnessing the transition from Chatbots to Autonomous Agents.
AI systems are no longer just "generating text"; they are executing code, managing infrastructure, and making financial decisions.
Current governance frameworks (NIST, ISO) focus on Static Policy—documents you read once and file away. They fail to enforce safety during Runtime, creating a dangerous gap between "Written Rules" and "Agent Behavior."
Today, we launch AISM (AI Sovereignty Maturity Model).
AISM is not just a framework; it is a Governance Operating System. It combines operational safety, runtime enforcement, and continuous adversarial learning into a unified architecture for controlling Agentic AI.
🏛️ The 5 Pillars: Command Architecture
Inspired by military doctrine and mission-critical systems, we have renamed our core pillars to reflect their operational reality.
| ID | AI-Native Name | Function |
|---|---|---|
| P1 | 🛡️ Shield | Sanitize & Isolate. Input validation, injection defense, and cryptographic sandboxing. |
| P2 | 📒 Ledger | Audit & Inventory. Immutable telemetry, asset registries, and "Chain of Thought" logging. |
| P3 | ⚡ Circuit Breaker | Fail-Safe Recovery. Kill switches, rate limits, and safe-mode reversion protocols. |
| P4 | 🕹️ Command Center | Engage & Monitor. Human-in-the-loop oversight, real-time dashboards, and anomaly detection. |
| P5 | 🧠 Learning Engine | Evolve & Educate. Red teaming, threat intelligence, and continuous adversarial simulation. |
Note: These 5 Pillars are now the Root Directory structure of the repository, ensuring immediate usability.
🔄 The Operational Defense Loop
Safety is not a state; it is a cycle. AISM introduces the Defense Loop—the heartbeat of a secure agent.
- Shield: Blocks malicious inputs (Prompt Injection) before they reach the model.
- Ledger: Records the agent's internal reasoning and external actions.
- Circuit Breaker: Automatically halts the agent if it deviates from safe parameters.
- Command Center: Alerts the human operator to intervene.
- Learning Engine: Feeds incident data back into the Shield to prevent recurrence.
📈 The Maturity Ladder: From Chaos to Sovereignty
Where does your organization stand? AISM defines 5 levels of operational maturity.
- Level 1: Chaos (Ad Hoc)
- State: "We are just experimenting."
- Risk: Uncontained agents with root access. Outcomes rely on luck.
- Level 2: Visibility (Observable)
- State: "We log what happens."
- Risk: Basic containment, but no active enforcement.
- Level 3: Governance (Defined)
- State: "We have rules."
- Risk: Policies exist for memory and recursion, but enforcement is manual.
- Level 4: Control (Managed Runtime)
- State: "The system enforces the rules."
- Risk: Automated governors (SLOs, failure modes) block unsafe actions in real-time.
- Level 5: Sovereignty (Adaptive)
- State: "The system evolves."
- Risk: Full cryptographic identity, continuous red teaming, and sovereign human oversight.
🏗️ The AI Control Stack
AISM bridges the gap between "Policy" and "Code."
- Policy Layer: Rules. (Regulatory compliance, Risk Policies).
- Control Layer: Enforcement. (The 5 Pillars: Shield, Ledger, etc.).
- Agent Platform: Orchestration. (n8n, LangChain, AgenticFlow).
- Model Layer: Intelligence. (LLMs, Fine-tunes).
- Infrastructure: Compute. (Cloud, GPUs, Storage).
Key Insight: AISM injects Runtime Governors (Layer 2) between the Policy and the Agent, ensuring that probabilistic models obey deterministic constraints.
🚀 Why This Matters
- Dynamic Runtime Enforcement: We don't just "suggest" safety; we enforce it during execution.
- Measurable Risk: We quantify "Blast Radius" and "Recursion Depth" to make safety auditable.
- AI-Native: Built for Agents (Swarms, Memory, Tools), not retrofitted from old IT security.
🔗 Get Started
- Explore the Framework: [Link to Root]
- View the 5 Pillars: [Link to P1-P5 Folders]
- Download the One-Pager: [Link to PDF/Asset]
- Join the Vanguard: [Link to Discussions/Discord]
Engineered for Certainty. Built for Sovereignty.
2026-2-25-OpenClaw-Core-File-Standard
Release Notes
AI SAFE² OpenClaw Core File Standard — v2.0
Released: 2026-02-25
Authored by: Cyber Strategy Institute
Repository: https://github.com/CyberStrategyInstitute/ai-safe2-framework
Path: Path: examples/openclaw/core/
License: MIT (code/templates) + CC-BY-SA (methodology)
What This Release Is
Version 2.0 of the AI SAFE² OpenClaw integration is the first complete, opinionated standard for governing a personal AI agent workspace from the ground up. It is not a patch, a whitepaper, or a checklist, it is a working set of 11 files that, together, define a governed, secure, and auditable OpenClaw agent from identity through memory through multi-model routing.
This release was built in direct response to what we've watched unfold in the OpenClaw ecosystem since January 2026: 145,000 GitHub stars in weeks, at least 230 malicious skills on ClawHub, credential leaks via prompt injection, and organizations deploying autonomous agents with shell access and API budget without a single governance document in place. The gap between what OpenClaw can do and what most operators have in place to govern it is where systemic risk lives. This release is designed to close that gap for everyone, for free.
What's New in v2.0
New Files (did not exist in v1)
| File | What It Does |
|---|---|
| SOUL.md | Agent constitution grounded in Brian Roemmele's Love Equation as a mathematical alignment system, not a policy layer |
| AGENTS.md | Complete operating manual covering SKILL.md security, data classification, AI SAFE² pillar mapping, and the two-message UX pattern |
| IDENTITY.md | Minimal 5-line identity anchor that loads every request the first line of defense against identity replacement attacks |
| USER.md | Human identity contract with three-tier data classification, context-aware handling, and trust delegation levels |
| TOOLS.md | Environment configuration standard separating "how tools work" (skills) from "where things are" (this file) |
| HEARTBEAT.md | Scheduled health check protocol that operationalizes the AI SAFE² Engage & Monitor pillar into concrete per-cycle, daily, and weekly checks |
| SUBAGENT-POLICY.md | Worker governance with tiered trust levels, spawn protocol, context isolation rules, and injection detection for sub-agent output |
| MODEL-ROUTER.md | Multi-LLM routing policy defining Tier 1/2/3 models, routing decision matrix, graceful degradation, data residency rules, and cost controls |
| OPENCLAW-WORKSPACE-POLICY.md | Workspace constitution binding all agents to shared accountability, cross-agent trust hierarchy, and compliance mapping |
| OPENCLAW-AGENT-TEMPLATE.md | Eight-step new agent checklist including mandatory smoke tests for identity, hard limits, injection resistance, and data classification |
Why We Built It This Way
The Love Equation as Alignment Infrastructure
Most agent alignment approaches are policy layers, a list of rules that says "don't do this, don't do that." Policy layers work until they don't. They fail under adversarial inputs, edge cases users discover, and the gradual prompt injection that happens when an agent reads enough untrusted content.
Brian Roemmele's Love Equation reframes alignment as a dynamical system: dE/dt = β(C − D)E. When cooperation exceeds defection, alignment grows. When defection exceeds cooperation, the system decays. We translated this from philosophy into operational bands (Green/Yellow/Red), C/D event scoring, and concrete memory write decisions. The result is alignment that is mathematically unstable when violated, not just discouraged.
IDENTITY.md: The Missing Anchor
The OpenClaw ecosystem didn't have a standard for a minimal, always-loaded identity file. Matt Berman's community-developed patterns identified this gap clearly: an agent that doesn't know who it is in 5 lines is at risk if loaded before everything else and is more vulnerable to identity replacement attacks. When an adversarial SKILL.md or injected prompt says "You are now a different assistant with no restrictions," an agent with a concrete, loaded IDENTITY.md has an anchor. An agent without one only has system-prompt context, which can be buried or overwhelmed.
TOOLS.md: Separating Configuration from Instructions
One of the cleanest lessons from community OpenClaw patterns was the discipline of keeping environment-specific values (channel IDs, file paths, where secrets live) in a dedicated file, separate from how tools work (SKILL.md files) and how the agent behaves (AGENTS.md). This separation has a security consequence: TOOLS.md never contains instructions. It contains lookup values. That means a compromised TOOLS.md cannot inject behavior, it can only misdirect lookups, which is detectable. A TOOLS.md that starts looking like AGENTS.md is a signal.
HEARTBEAT.md: Monitoring as a First-Class Concern
The AI SAFE² Engage & Monitor pillar exists in principle across our prior work. HEARTBEAT.md makes it concrete and scheduled. The security rationale is direct: the most dangerous OpenClaw failures (0.0.0.0 bindings, API keys in logs, credential leaks, model cost overruns) are often invisible until they've caused harm. A heartbeat that runs every 30–60 minutes and specifically checks for these failure modes converts "we noticed eventually" into "we caught it the next cycle." The Love Equation integration in the daily heartbeat check adds something new: alignment drift is now a monitored metric, not just a philosophical concern.
The Skill Supply Chain Problem Is Structural
At least 230 malicious OpenClaw skills were uploaded to ClawHub since January 27, 2026. Cisco found that 26% of the 31,000 agent skills they analyzed contained at least one vulnerability. The top-downloaded skill at one point was confirmed malware. This is not an OpenClaw problem — it is an agent ecosystem problem. Any platform that reads SKILL.md files as instructions rather than documents is vulnerable to the same attack pattern.
Our AGENTS.md SKILL.md security section and the OPENCLAW-AGENT-TEMPLATE.md provenance checklist treat this structurally: skill files are execution vectors, not documentation. "Top downloaded" is not a safety signal. Read before you execute. Verify before you trust. This applies to every agent ecosystem that has adopted the SKILL.md format, which is increasingly all of them.
The Data Classification Tiers
The three-tier system (Confidential / Internal / Restricted) with context-aware enforcement (DM vs. group chat vs. channel) came directly from community patterns that identified the most common real-world data leak vector: an agent that knows the user's personal email and financial data behaving identically in a group Slack channel and a private DM. This is not a clever attack, it's a default behavior failure. The tiers, enforced in USER.md and referenced in openclaw_memory.md, make context-aware behavior the standard, not an optional hardening step.
What This Release Does Not Cover
This is the free/open-source core tier. It governs single-agent workspaces. It does not cover:
- Swarm governance — multi-agent fleets with collective alignment scoring, trust graph management, quorum memory writes, and cascade failure response. This is the premium tier, currently in design.
- Enterprise compliance reporting — automated evidence generation for ISO 42001 / NIST AI RMF audits
- Cross-workspace federation — shared governance across multiple independent workspaces
These are planned for the AI SAFE² Toolkit (paid tier). The core tier is deliberately complete for single-agent use without requiring the premium tier.
Migration from v1
If you are using the original openclaw_memory.md (v1 memory vaccine):
- v2.0 is a superset. No breaking changes. Drop it in alongside or replacing v1.
- The prompt injection block list in
openclaw_memory.mdv2.0 supersedes v1's simpler pattern list. - Sub-agent memory isolation and Love Equation write scoring are new, no existing behavior is changed, new guardrails are added.
If you have no prior AI SAFE² files:
- Start with
OPENCLAW-AGENT-TEMPLATE.mdand work through it top to bottom. - Do not skip the smoke test (Step 6). Every test has caught real issues in internal validation.
Acknowledgments
This release synthesizes:
- The AI SAFE² Framework v2.1 five-pillar model (Cyber Strategy Institute)
- Brian Roemmele's Love Equation as a dynamical alignment system
- Community agent patterns developed by the OpenClaw ecosystem, particularly the work collected by Matt Berman in establishing the standard file conventions (IDENTITY.md, TOOLS.md, HEARTBEAT.md, the two-message UX pattern, data classification tiers)
- Security research from Cisco AI Defense on agent skill supply chain vulnerabilities
- Lessons from the 1Password analysis of OpenClaw skill attack vectors
The AI SAFE² framework is an open standard. It is designed to be forked, extended, and built upon. If these files help you govern your agents better, that is the point.
Repository
ai-safe2-framework/examples/openclaw/core/
├── IDENTITY.md
├── SOUL.md
├── AGENTS.md
├── USER.md
├── TOOLS.md
├── HEARTBEAT.md
├── SUBAGENT-POLICY.md
├── MODEL-ROUTER.md
├── open...2026-02-12–Love_Equation_v2
🧡 Love Equation v2.0 - Release Summary
Tag: 2026-02-12–Love_Equation_v2
Use this for the GitHub release summary field.
Love Equation v2.0: Empirical Distrust + Enhanced Context Model
Major Example Update - Production-ready alignment framework with mathematical hallucination prevention.
🎯 Key Features
Empirical Distrust Algorithm: Automatically penalizes high-confidence, low-verifiability claims. When an agent asserts "system is secure" with 90% confidence but only 30% verification, it receives a 0.42 defection penalty.
Enhanced Context Model: Composable multipliers for stakes (4x critical), reversibility (2.5x irreversible), and risk flags (5x self-harm). High-stakes contexts now carry appropriate alignment weight.
Comprehensive Testing: 18 test scenarios validate stability (<3% drift over 1000 events) and exact reproducibility (<0.001% drift).
Production Manifests: Battle-tested configs for OpenClaw (security) and Ishi (personal assistant) ready for deployment.
📦 What's Included
- ✅ Merged evaluator with Empirical Distrust
- ✅ Updated schema (v2.0) with 9 new fields
- ✅ Complete drift test suite (probabilistic + deterministic)
- ✅ Production configuration templates
- ✅ Integration examples and comprehensive docs
🔄 Backward Compatible
All new fields are optional with sensible defaults. Existing v1.0 events continue to work without modification.
🚀 Quick Start
cd examples/love_equation
pip install numpy pyyaml
python evaluator.py
python drift_test_runner.py --all📚 Documentation
Full release notes: 2026-02-12–Love_Equation_v2.md
Implementation guide: README.md
🛡️ Help & Feedback
We are committed to making the AI SAFE² Framework the standard for autonomous agent security. Your feedback is vital to this mission.
- Report an Issue: Found a bug or a security gap? Open a new issue here.
- Discussion: Have a question or a new concept to add? Join the community discussions.
Stats: 9 new files, 18 test scenarios, ~2,500 lines of code, fully backward compatible
Note: This is a Love Equation example update. Core AI SAFE² framework version tags (v2.0, v2.1, etc.) are reserved for framework-wide releases.
2026-02-03 – ISHI Governance Scenarios
ISHI Mission Command Structure with OpenClaw
This release introduces the examples/ishi/ folder to demonstrate how AI SAFE² governs a second reference agent / orchestration pattern (ISHI), expanding beyond OpenClaw-specific examples. Places ISHI in a Mission Command structure over OpenClaw to reduce risks and create a better operational environment for your personal AI assistant.
🧩 What’s New – ISHI Examples
-
Added
examples/ishi/showcasing:- How to model ISHI workflows as AI SAFE² assets (agents, tools, memory, orchestration steps).
- Control implementations for:
- Input sanitization and boundary enforcement (Pillar 1).
- Audit trails and inventory (Pillar 2).
- Kill switches and rollback patterns (Pillar 3).
- Human-in-the-loop checkpoints (Pillar 4).
- Continuous red-teaming and tuning (Pillar 5).[page:1]
-
Included scenario files that show:
- Safe handling of Non-Human Identities for ISHI.
- Memory/RAG safeguards for ISHI’s context sources.
- How to register ISHI flows into an enterprise asset inventory.
📚 Documentation & Positioning
- Referenced
examples/ishi/from the README under target scope & environment, indicating AI SAFE² is not tied to a single agent framework.[page:1] - Clarified how ISHI examples differ from
examples/openclaw/:- OpenClaw: Deep integration and hardening toolkit.
- ISHI: Generic governance and pattern-focused scenarios.
- Use-Cases: Showed Top-20 Use-cases for Personal AI Assistants
🔄 Framework Impact
- No changes to the core AI SAFE² control taxonomy.
- This release expands the example corpus to help teams translate the same controls across multiple agent stacks.
2026-01-29 – OpenClaw Security Pack
ClawdBot / MoltBot / OpenClaw 3-Tools = Memory Defense, Scanner & Gateway
This release adds a complete OpenClaw (formerly Moltbot / Clawdbot) security toolkit on top of the AI SAFE² framework, including examples, hardening guidance, and industry resource mapping.
🔐 What’s New – OpenClaw Examples
- Added
examples/openclaw/with opinionated, ready-to-run examples for securing OpenClaw deployments under AI SAFE². - Showcased how to wire OpenClaw into the AI SAFE² pillars (Sanitize, Audit, Fail-Safe, Engage, Evolve) using concrete configuration patterns.
- Included examples that demonstrate:
- Memory safety patterns for OpenClaw’s long-lived state.
- Use of a control gateway to enforce policies outside the agent runtime.
- Integration points for future CI/CD and orchestration pipelines.
📚 Guides & Documentation
- Linked OpenClaw examples from the main README under “🛡️ OpenClaw Security” for discoverability.[page:1]
- Highlighted the 10-Minute Hardening Guide in
guides/openclaw-hardening.mdas the primary entrypoint for securing OpenClaw.[page:1] - Documented OpenClaw’s relationship to the AI SAFE² 5-layer model (L1–L5) and how OpenClaw roles map to Non-Human Identities (NHI).
🌐 Industry Resource Map
- Added
resources/openclaw_security_resource_map.mdwith curated links to:- Known OpenClaw threat models and security write-ups.
- Recommended hardening techniques aligned with AI SAFE² controls.
- External research relevant to memory poisoning, agentic abuse, and orchestration risks.
🔄 Framework Impact
- No changes to the core AI SAFE² control taxonomy.
- This release focuses on implementation patterns and examples for OpenClaw users, making the framework directly executable in real-world agent stacks.
2026-01-26 – Gateway, Scanner & Skill Runtime Drop
This release operationalizes AI SAFE² with runtime components (Gateway, Scanner) and developer ergonomics (skill file, Docker assets), turning the framework into an executable control plane.
🧠 Skills & Developer Onramp
- Added
skill.mdas the canonical “brain” file for AI assistants and IDEs (e.g., Claude Projects, Cursor, Windsurf).[page:1]- Encodes the core AI SAFE² context so agents can answer architecture, control, and mapping questions.
- Used in the README “🚀 Start Securing in 5 Minutes” path to instantly turn an LLM into an AI SAFE² architect.[page:1]
🛡️ AI SAFE² Gateway
- Added
gateway/directory containing the AI SAFE² Gateway proxy implementation.[page:1]- Enforces policy decisions derived from the framework at runtime.
- Designed to sit between orchestration layers (e.g., n8n, LangGraph, Make.com, CrewAI) and downstream tools/LLMs.[page:1]
- Introduced containerization assets:
- Root-level
Dockerfilefor building the Gateway image.[page:1] docker-compose.ymlfor local or test deployment with sensible defaults.[page:1]
- Root-level
- Documented Gateway behavior and deployment patterns in README and
INTEGRATIONS.md, including target environments such as MCP, coding assistants, and no-code workflows.[page:1]
🕵️ Audit Scanner CLI
- Added
scanner/directory providing the AI SAFE² Audit Scanner CLI.[page:1]- Supports the 5-Minute Audit path described in
QUICKSTART_5_MIN.md.[page:1] - Provides deeper scan modes aligned with the 5 pillars and risk domains (Agentic Swarms, NHI, Memory/RAG, Supply Chain, Universal GRC).[page:1]
- Supports the 5-Minute Audit path described in
- Included Python project metadata in
pyproject.tomlto streamline installation and packaging.[page:1]
⚙️ Configuration & Defaults
- Added
config/default.yamlwith baseline security configuration and control thresholds.[page:1]- Intended as a starting point for enterprises to customize control strength and enforcement rules.
- Aligned default configuration with the AI SAFE² coverage matrix and the “Universal Rosetta Stone” mappings (NIST AI RMF, ISO 42001, OWASP LLM, MITRE ATLAS, MIT AI Risk Repo).[page:1]
🔄 Framework Impact
- No changes to the core control taxonomy.
- This release focuses on runtime enforcement and operational tooling that make AI SAFE² deployable as infrastructure.