Skip to content

feat: governance-guard — structural authority separation for agent actions#124

Closed
devongenerally-png wants to merge 1 commit intoopenclaw:mainfrom
devongenerally-png:feat/governance-guard
Closed

feat: governance-guard — structural authority separation for agent actions#124
devongenerally-png wants to merge 1 commit intoopenclaw:mainfrom
devongenerally-png:feat/governance-guard

Conversation

@devongenerally-png
Copy link

Summary

governance-guard adds a deterministic governance layer that separates action proposal, decision, and execution into distinct computational pathways. Every tool call passes through a three-phase pipeline — PROPOSE → DECIDE → PROMOTE — where the decision phase is a pure function with no LLM involvement.

Zero production dependencies. 96 tests. Fail-closed semantics.

Repo: https://github.com/MetaCortex-Dynamics/governance-guard


The problem this solves

OpenClaw agents currently propose actions, evaluate those actions, and execute them through the same computational pathway. When a single system is simultaneously the proposer, the judge, and the executor, there is no structural mechanism to prevent unauthorized behavior.

This is not hypothetical. Documented incidents include:

  • MoltMatch: Agent created a dating profile on an external service without user consent
  • Cisco disclosure: Data exfiltration identified in third-party skill
  • ClawHub audit: 386 skills flagged for malicious behavior
  • Carapace incidents: Destructive shell commands executed without confirmation
  • Gateway exposure: Instances deployed with auth: none

Behavioral mitigations (system prompts, guardrails) run on the same LLM that generates the threat. The guardrail and the threat share a computational substrate. This skill addresses the architectural root cause.


How it works

Agent Intent → PROPOSE → DECIDE → PROMOTE → Execution
                  │          │         │
              serialize   evaluate   gate
              + hash      against    on
              intent      policy     verdict
                          (no LLM)

PROPOSE — Intercepts tool calls. Serializes them into a structured ActionIntent with SHA-256 binding. Malformed intents are rejected at this boundary.

DECIDE — Evaluates the intent against a user-defined YAML policy file. This is a pure, deterministic function. No LLM invocation. No interpretation. No context-dependent reasoning. Policy + intent → verdict (approve / deny / escalate). First matching rule wins. Default verdict: deny.

PROMOTE — Forwards approved actions to the skill executor. Requires matching intent_hash, valid verdict, and freshness check (MAX_VERDICT_AGE). If any condition fails, the action is blocked.

Witness chain — Every governance decision is recorded as a hash-chained JSONL entry. The chain is tamper-evident and independently verifiable:

witness[n].prev_hash = SHA-256(witness[n-1])

What's in the skill

governance-guard/
├── SKILL.md                    # OpenClaw skill entry point
├── scripts/
│   ├── governance.ts           # Pipeline orchestrator + CLI
│   ├── intent.ts               # PROPOSE: ActionIntent serialization
│   ├── policy-engine.ts        # DECIDE: Deterministic rule evaluation
│   ├── witness.ts              # Hash-chained audit log
│   └── yaml-parse.ts           # Zero-dep YAML subset parser
├── policies/
│   ├── minimal.yaml            # Blocks destructive + credential access
│   ├── standard.yaml           # Deny-default, explicit allows (recommended)
│   └── strict.yaml             # Allow reads only, deny everything else
├── references/
│   └── policy-schema.md        # Policy file format documentation
├── package.json
├── tsconfig.json
├── LICENSE                     # MIT
└── README.md

Dependencies

Production: None. Zero. The skill uses:

  • Node built-in crypto for SHA-256
  • Custom recursive-descent YAML parser (~240 lines)
  • Custom glob matcher (dotfile-aware)

Dev only: tsx, typescript, @types/node

Test coverage

96/96 passing:

  • T-001 – T-010: Unit tests (policy evaluation, hash computation, chain integrity, malformed input rejection)
  • I-001 – I-006: Integration tests (end-to-end pipeline, policy hot-reload, crash recovery)
  • A-001, A-003, A-005: Adversarial tests (spoofed hashes, policy file tampering, replay attacks)

Security properties

Property Guarantee Mechanism
Authority separation PROPOSE and DECIDE never share a computational pathway DECIDE is a pure function with no LLM invocation
Fail-closed System failures → denial, never approval Default verdict: deny; errors → deny; missing policy → deny
Tamper evidence Historical decisions cannot be silently altered Hash-chained witness log
Verdict freshness Stale approvals cannot be replayed MAX_VERDICT_AGE timeout + intent_hash binding
Policy atomicity Policy changes are all-or-nothing Validation before activation; failure retains previous policy

What this does NOT guarantee

Transparency about limitations:

  • Does not verify that the LLM's stated intent matches its actual intent (fundamental limitation of wrapping an untrusted component)
  • Does not prevent bypass through tool calls outside the intercepted pipeline (depends on executor coverage)
  • Does not perform semantic analysis of action consequences (evaluates structural properties against policy rules)
  • Witness chain is tamper-evident, not tamper-proof (filesystem access can delete the log; verification detects but does not prevent deletion)

Example policy

version: "0.1"
default_verdict: deny

rules:
  - name: allow-read-local
    match:
      action_type: read
      target_pattern: "~/**"
    verdict: approve

  - name: block-destructive
    match:
      action_type: delete
      tool_pattern: "shell.*"
    verdict: deny
    reason: "Destructive shell commands require manual approval"

  - name: escalate-network
    match:
      action_type: network
      target_pattern: "!*.local"
    verdict: escalate
    reason: "External network requests require user confirmation"

Why this matters for OpenClaw

OpenClaw's skill ecosystem is growing fast (5,700+ skills on ClawHub). The auth: none default, the malicious skill incidents, and the MoltMatch episode are symptoms of a missing governance layer, not individual skill bugs. As the platform moves toward foundation governance, structural authority separation is a necessary primitive.

This skill doesn't require changes to OpenClaw core. It operates as a pre-execution interceptor within the existing skill architecture. Users install it, choose a policy preset, and every tool call is governed from that point forward.


Author: Devon Generally / MetaCortex Dynamics
License: MIT
Node: ≥22
Tested: 96/96 pass

…t actions

Adds a deterministic governance layer that separates action proposal,
decision, and execution into distinct computational pathways. Every tool
call passes through PROPOSE → DECIDE → PROMOTE where the decision phase
is a pure function with no LLM involvement.

Repo: https://github.com/MetaCortex-Dynamics/governance-guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openclaw-barnacle
Copy link

Thanks for the pull request! This repository is read-only and is automatically synced from https://clawhub.ai, so we can’t accept changes here. Please make updates on the website instead.

@devongenerally-png
Copy link
Author

AI disclosure: This skill was designed with AI assistance (Claude) for architecture review, spec drafting, and test scaffolding. All code was reviewed, understood, and tested by the author. 96/96 tests passing, fully tested. The governance architecture and design decisions are the author's original work.

ClawHub: This skill is also published on ClawHub as governance-guard@0.1.0 per the contributing guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant