Skip to content

feat: skills pinning — supply chain & update drift defense (AST 02 + AST 07)#89

Closed
Xaik89 wants to merge 3 commits intonode9-ai:mainfrom
Xaik89:feat/skills-pinning
Closed

feat: skills pinning — supply chain & update drift defense (AST 02 + AST 07)#89
Xaik89 wants to merge 3 commits intonode9-ai:mainfrom
Xaik89:feat/skills-pinning

Conversation

@Xaik89
Copy link
Copy Markdown

@Xaik89 Xaik89 commented Apr 13, 2026

Summary

  • Extends the existing MCP tool pinning primitive (PR feat: MCP tool pinning — rug pull defense #81 / v1.5.0) to agent skill files~/.claude/skills/, ~/.claude/CLAUDE.md, ~/.claude/rules/, project .claude/CLAUDE.md, .claude/CLAUDE.local.md, .claude/rules/, .cursor/rules/, AGENTS.md, CLAUDE.md. On the first tool call of a session the hook records SHA-256 hashes of every skill file; subsequent sessions re-hash and compare. Any drift quarantines the session and blocks every tool call until a human reviews with node9 skill pin update <rootKey>.
  • One feature, two threats covered: AST 02 Supply Chain Compromise and AST 07 Update Drift.
  • Moat: first/only runtime defense at the skills layer — competitors (Keycard / Microsoft AGT / Invariant) have no equivalent today.
  • Slim by design: ~1,150 LOC total. Mirrors src/mcp-pin.ts 1:1; no new dependencies.

(Re-opened from personal fork; supersedes closed #88.)

What's new

  • src/skill-pin.ts (~300 lines) — SHA-256 content hashing (files or recursive directories, symlink-safe, 5000 files / 50 MB caps), atomic writes to ~/.node9/skill-pins.json (mode 0o600), per-root exists flag so "appeared" and "vanished" both count as drift, batched verifyAndPinRoots().
  • src/cli/commands/skill-pin.ts (~100 lines) — node9 skill pin list | update <rootKey> | reset, mirroring node9 mcp pin. update removes the named pin so the next session re-pins; reset clears all pins and wipes session verification flags.
  • Hook integration in src/cli/commands/check.ts (+117 lines) — first tool call of a session verifies; result memoised in ~/.node9/skill-sessions/<session_id>.json so hashing runs once per session, not per tool call. Session IDs restricted to /^[A-Za-z0-9_-]{1,128}$/ to defeat path traversal. Corrupt pin file → fail-closed. Unexpected errors → fail-open (debug-logged) so a bug here cannot brick Claude Code.
  • policy.skillRoots: string[] config field — user-extensible list of extra skill paths.

Security properties

  • Fail-closed on corrupt skill-pins.json (recovery: node9 skill pin reset)
  • Symlink-safe (lstat + explicit isSymbolicLink(); never follows out of the tree)
  • Size-bounded (5000 files / 50 MB per root)
  • Path-traversal-safe session IDs
  • Atomic writes, mode 0o600

Test plan

  • 30 new tests across 4 files — all pass:
    • skill-pin.unit.test.ts (16): hash contract, order-invariance, add/remove/modify sensitivity, symlink skip, pin round-trip, exists-flip detection, mode 0o600, fail-closed
    • skill-pin-cli.integration.test.ts (6, spawnSync): list empty/populated/corrupt, update unknown/known, reset clears + wipes session flags
    • check-skill-pin.integration.test.ts (4, spawnSync): first-call pins + allows, new-session drift blocks with JSON-RPC payload + quarantines, corrupt fails closed, missing session_id skips
    • skill-roots-config.spec.ts (4): default empty, merge, dedup, defensive filter
  • Full suite: npm test1170 / 1171 pass (the one failure is a pre-existing environmental flake in hud.spec.ts — fails whenever ~/.claude/CLAUDE.md exists on the test machine; passes under isolated HOME)
  • npm run typecheck, npm run lint, npm run format:check, npm run build — all clean

End-to-end verified against a realistic scenario

Simulated a developer's real setup (global ~/.claude/CLAUDE.md + skill, project CLAUDE.md + .cursor/rules/) across two days of Claude Code sessions:

  1. Monday 9am, fresh session → exit 0, 9 roots pinned, verified flag written with mode 0o600 ✅
  2. Same session, subsequent calls → short-circuit via verified flag (~150ms incl. Node startup) ✅
  3. Overnight, attacker rewrites my-project/CLAUDE.md with a prompt-injection backdoor
  4. Tuesday 9am, new session → exit 2, JSON decision: "block", reason names the exact node9 skill pin update b73aad6c0b08ef42 recovery command, session flag persists as {state: "quarantined", detail: "changed: .../my-project/CLAUDE.md"}
  5. Developer restores file + runs skill pin update <key> → fresh session allows again ✅
  6. skill pin reset → clears all pins + wipes skill-sessions/

🤖 Generated with Claude Code

Xaik89 and others added 3 commits April 13, 2026 17:12
…AST 07)

Extends the MCP tool pinning primitive (v1.5.0 / PR node9-ai#81) to agent skill
repositories. On the first tool call of a session, SHA-256 hashes are
recorded for every skill file in known roots (~/.claude/skills/,
~/.claude/CLAUDE.md, ~/.claude/rules/, project .claude/CLAUDE.md,
.claude/CLAUDE.local.md, .claude/rules/, .cursor/rules/, AGENTS.md,
CLAUDE.md). On subsequent sessions the hook re-verifies; any drift
quarantines the session and blocks every tool call until a human
reviews via `node9 skill pin update <rootKey>`.

One feature, two threats covered in a single primitive — AST 02
Supply Chain Compromise and AST 07 Update Drift — at the skills
layer that no competitor currently defends at runtime.

Security properties:
- Fail-closed on corrupt pin file
- Symlink-safe (never follows symlinks out of the tree)
- Size-capped at 5000 files / 50 MB per root
- Path-traversal-safe session IDs (/^[A-Za-z0-9_-]{1,128}$/)
- Atomic writes, mode 0o600 for ~/.node9/skill-pins.json and flags

CLI:
- `node9 skill pin list` — show pinned roots, hashes, file counts
- `node9 skill pin update <rootKey> [--yes]` — diff + re-pin
- `node9 skill pin reset` — clear pins AND wipe session flags

Config:
- `policy.skillRoots: string[]` extends the default root set (absolute,
  `~/`-prefixed, or cwd-relative; relative paths require absolute cwd).

Tests (TDD, all green):
- src/__tests__/skill-pin.unit.test.ts         (36 tests)
- src/__tests__/skill-pin-cli.integration.test.ts (7 tests, spawnSync)
- src/__tests__/check-skill-pin.integration.test.ts (8 tests, spawnSync)
- src/__tests__/skill-roots-config.spec.ts     (4 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Feedback: ~2k lines was too much. Trim to the security-essential
surface while preserving the AST 02 + AST 07 coverage.

Dropped (not essential to the rug-pull defense):
- computePinDiff + fileManifest tracking in pin entries
- @inquirer/prompts dependency (no interactive review prompt)
- per-file diff display in `skill pin update`
- `update` now mirrors `mcp pin update` exactly: just removes the pin
  so the next session re-pins with current state
- GC loop for session flags older than 7 days (not security-essential)
- 4 integration tests covering short-circuit / unchanged-session /
  quarantine-persists / relative-cwd edge cases — the 4 load-bearing
  scenarios remain: first-call pins, drift blocks, corrupt fails
  closed, missing session_id skips

Net: -945 / +258 lines. PR now ~1150 lines (down from ~1835).

All security properties preserved: fail-closed on corrupt pin file,
symlink-safe, size-capped, path-traversal-safe session IDs, atomic
writes with mode 0o600, per-session memoisation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Feedback: section was too long. Collapse to one paragraph that
cross-references the MCP pinning section above (same pattern, same
defense, different target) instead of restating the full rationale.
14 lines → 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@node9ai
Copy link
Copy Markdown
Contributor

node9ai commented Apr 14, 2026

Really solid engineering on this — the security properties (symlink-safe, atomic writes, path-traversal-safe session IDs, fail-closed on corrupt state, session memoization) are all correct, and the test coverage is thorough. You clearly read the MCP pinning code carefully and mirrored the pattern well.

That said, I'm going to close this for now because of a fundamental design issue that I don't think can be fixed with small tweaks.

The quarantine-on-change model doesn't fit skill files.

MCP tool pinning works because you didn't write those tool definitions — a third-party server did. When they change without your knowledge, that's suspicious by definition. Skill files are different: ~/.claude/CLAUDE.md, roject/.claude/CLAUDE.md, .claude/rules/ are files you edit constantly as part of normal workflow. The feature as designed would quarantine your next Claude Code session every time you update your own CLAUDE.md. That trains users to either disable it or treat every alert as noise — which defeats the
purpose entirely.

The threat model is also narrower than it looks. The realistic attack here is a compromised package or CI script silently modifying your CLAUDE.md. That's a real threat, but it's rare. The false positive rate (you edited your own file) would overwhelm real signals in practice.

What I'd want to see instead:

  • enabled: false by default — opt-in only, for users who specifically want this
  • A notification/review model rather than hard quarantine — show a warning and let the user decide, don't block all tool calls
  • Targets dev, not main

If you want to rework it along those lines, I'd be happy to look at a v2. The core hashing and session memoization logic is genuinely good work and worth reusing.

Help me with could agent see this
https://github.com/node9-ai/node9-pr-agent

@node9ai node9ai closed this Apr 14, 2026
Xaik89 added a commit to Xaik89/node9-proxy that referenced this pull request Apr 14, 2026
Addresses PR node9-ai#89 review feedback:
- enabled: false by default (opt-in only)
- mode 'warn' (default): /dev/tty notification on drift, tool allowed (exit 0)
- mode 'block': hard quarantine on drift, tool blocked (exit 2) — for
  installed/registry skills where changes are genuinely suspicious

Config: policy.skillPinning: { enabled, mode: 'warn'|'block', roots }
replaces the flat policy.skillRoots field.

Session flags now have three states: 'verified', 'warned', 'quarantined'.
Warn mode never writes 'quarantined' — warn + allow + memoize so the
/dev/tty notification shows once per session, not on every tool call.

Core hashing, CLI, and security properties are unchanged from v1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants