Skip to content

feat(core): capability-gated tool surface (--tools-only / --disable-tools) #829

@shaun0927

Description

@shaun0927

Tier: core (additive metadata; default tools/list unchanged)
PR target: develop

Background

src/tools/index.ts registers ~70 tools at server start (exact count depends on progressive-disclosure state via expand_tools). The tools/list MCP response is large, and agents frequently mis-select low-value tools (workflow_*, oc_recording_*, crawl_sitemap); today Hint Engine corrects this at runtime token cost.

playwright-mcp solves the same problem with a per-tool capability field and --caps=... opt-in (microsoft/playwright-mcp v0.0.75, filteredTools() in playwright-core/src/tools/utils/mcp/server.ts).

P2 (zero-impact extension) compliant: tool definitions and behavior unchanged, only exposure is gated. Default npm start produces a byte-identical tools/list to v1.11.0.

Interaction with expand_tools (progressive disclosure)

expand_tools is preserved. Filter order:

  1. Capability filter (--tools-only / --disable-tools) defines the maximum set the agent can ever see.
  2. expand_tools operates within that set: it can reveal capability-allowed tools that are hidden by default, but never tools the capability filter excluded.

Conflict rule: capability filter wins. --tools-only=core + expand_tools(workflow_init) → tool stays hidden, structured error CAPABILITY_DISABLED.

Proposed Implementation

  1. Extend ToolDefinition with capability: 'core' | 'crawl' | 'recording' | 'workflow' | 'storage' | 'profile' | 'totp' | 'pilot'. Absent → 'core' for backward compat (P1).
  2. Tag every registered tool in src/tools/*.ts. Initial grouping (the PR must validate against the actual registry):
    • core: navigate, read_page, query_dom, find, inspect, interact, computer, act, fill_form, form_input, javascript_tool, page_, screenshot, journal, tabs_, oc_connection_health, oc_session_resume/snapshot, oc_assert, oc_evidence_bundle, oc_checkpoint, wait_for, console_capture, validate_page, request_intercept, network*, emulate_device, file_upload, drag_drop, http_auth, user_agent, geolocation, lightweight_scroll, memory, extract_data, performance_metrics, page_reload, expand_tools
    • storage: cookies, storage
    • profile: list_profiles, oc_profile_status
    • crawl: crawl, crawl_sitemap, batch_execute, batch_paginate, worker_update, worker_complete
    • recording: oc_recording_start, oc_recording_stop, oc_recording_list, oc_recording_export
    • workflow: workflow_init, workflow_status, workflow_collect, workflow_collect_partial, workflow_cleanup, execute_plan
    • totp: oc_totp_generate
  3. CLI flags (src/index.ts, commander chain — current flags at ~line 72–98):
    • --tools-only <csv> — exposes only listed capabilities
    • --disable-tools <csv> — removes listed capabilities
  4. New lint: npm run lint:tools-capabilities asserts every registered tool has a capability tag (CI-enforced).
  5. Filtering at registerTools(): apply capability filter; expand_tools enforces the gate per rule above.
  6. Backward compat: when neither flag is set, tools/list is byte-identical to v1.11.0. Snapshot committed at src/tools/__tests__/__snapshots__/tools-list.v1.11.snap.json.

Acceptance Criteria

  • Every registered tool has a non-empty capability field (lint:tools-capabilities CI-green)
  • --tools-only, --disable-tools flags implemented in src/index.ts
  • expand_tools rejects capability-excluded tools with CAPABILITY_DISABLED error
  • Unit tests in src/tools/__tests__/capability-filter.spec.ts cover the 6 verification scenarios below
  • Default tools/list byte-identical to v1.11.0 baseline (snapshot diff is empty)
  • npm run lint:tier passes
  • CHANGELOG (Unreleased) entry
  • PR targets develop

Verification (post-merge, executable scripts)

Setup

git checkout develop && git pull && npm ci && npm run build
node dist/index.js --http 9876 &  PID=$!
mcp() { curl -s -H "content-type: application/json" \
  -d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/list\"}" \
  http://localhost:9876/mcp ; }

Scenario 1 — default surface unchanged (regression)

mcp | jq -S '.result.tools | map(.name) | sort' > /tmp/tools-default.json
diff /tmp/tools-default.json \
     <(jq -S '.tools | map(.name) | sort' \
       src/tools/__tests__/__snapshots__/tools-list.v1.11.snap.json)

Pass: diff is empty.

Scenario 2 — --tools-only core

Restart server with --tools-only core, capture tools/list.
Pass: jq '.result.tools[].name' | grep -E '^"(workflow_|oc_recording_|crawl)' returns no matches.

Scenario 3 — --disable-tools workflow,recording

Restart server with --disable-tools workflow,recording.
Pass: no tool starts with workflow_ or oc_recording_; all core tools still present (count matches default minus the two groups exactly).

Scenario 4 — expand_tools respects capability gate

With --tools-only core active, call expand_tools({name: "workflow_init"}).
Pass: response is structured error { code: "CAPABILITY_DISABLED", capability: "workflow" }; subsequent tools/list does not include workflow_init.

Scenario 5 — tools/list byte reduction (explicit methodology)

# Default
DEFAULT_BYTES=$(mcp | wc -c)
# Core only (restart with --tools-only core)
CORE_BYTES=$(mcp | wc -c)
echo "default=$DEFAULT_BYTES core=$CORE_BYTES reduction=$((100 - 100*CORE_BYTES/DEFAULT_BYTES))%"

Pass: reduction ≥ 25%. Both numbers documented in the PR description.

Scenario 6 — synthetic skill replay

Setup script (committed to repo at scripts/verify/cap-filter-skill.ts): record a 3-step skill using core-only tools (navigate https://example.comread_pageinteract "More information…"). Then replay it under --tools-only core.
Pass: skill completes; zero MISSING_TOOL / CAPABILITY_DISABLED errors; outcome contract verdict matches the recording.

Issue closure criteria

All 6 scenarios pass + CI green + snapshot + setup script committed.

Out of scope (deferred)

  • --tool-allowlist / --tool-blocklist per-tool overrides — capability grouping is sufficient for v1; file follow-up if needed.
  • Removing/renaming any tool (P1 violation)
  • Auto-detecting capability needs (agent-side concern)

References

  • playwright-mcp filter: microsoft/playwright/packages/playwright-core/src/tools/utils/mcp/server.ts
  • Repo: src/tools/index.ts, src/index.ts:72-98

OpenChrome 실검증 체크리스트

2026-05-14 재검증 완료. 최신 origin/develop 코드, targeted Jest/lint, OpenChrome CLI 실호출, localhost fixture 산출물로 직접 확인 가능한 항목만 close 근거로 사용했다.

검증 대상

검증 증거

  • npm run build 통과.
  • npm run lint:tier 통과: 521 modules / 1239 dependencies, no dependency violations.
  • npm run lint:tool-schemas 통과: 82 baselined violations, 0 new.
  • targeted Jest 통과: 38 passed / 1 skipped suites, 436 passed / 1 skipped tests.
  • OpenChrome CLI 실호출: oc_connection_health connected, localhost fixture navigate 성공.
  • OpenChrome tools/list introspection에서 관련 default 또는 pilot-gated tool surface 존재 확인.
  • 대표 bounded diagnostic 호출이 구조화된 성공/오류 응답을 반환함을 확인.

이슈별 코드/테스트 근거

  • 관련 구현/문서/테스트 파일이 최신 트리에 존재하고 targeted 검증에 포함됨:
    • src/config/capability-filter.ts
    • src/index.ts
    • src/mcp-server.ts
    • tests/capability-filter.test.ts

산출물

  • 증거 로그: .omx/reverify-evidence/targeted-jest.log
  • 증거 로그: .omx/reverify-evidence/lint-tier.log
  • 증거 로그: .omx/reverify-evidence/lint-tool-schemas.log
  • 증거 로그: .omx/reverify-evidence/openchrome-live-smoke.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0P0 criticalenhancementNew feature or requestperformancePerformance, latency, throughput, or resource-use improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions