Skip to content

Auto-generated capability map for LLM preamble (build-time, drift-guarded) #826

@shaun0927

Description

@shaun0927

Tier: N/A (build-time docs generator; no runtime code, no harness flag required)

Why

openchrome ships ~60 MCP tools across ~19.7K LOC. That is too much for an LLM to fit in context, and the natural consequence is wrong-tool selection or tool-name hallucination. Inspired by browser-use/browser-harness's minimalism (the entire harness fits in an LLM context window), this issue ships a compact, build-time-generated capability map (~2–4 KB markdown) that MCP clients can prepend to their system prompt. Generation is mechanical from the existing tool registry, so it cannot drift.

This issue does not advocate shrinking the codebase. It is a docs artifact only.

Portability-harness alignment

  • P1/P2/P3/P4/P5: no runtime change, no flag, no new dep, no new storage. Build-time only.

Scope

  • Build-time script that introspects every tool registered in src/tools/index.ts and emits docs/agent/capability-map.md.
  • For each tool: name, one-line description (from MCPToolDefinition.description), parameter names with type names (no full schema dump), category, and a pilot: marker if the tool is only registered when OPENCHROME_PILOT=1.
  • CI guard: regenerating must produce no diff against the checked-in file. Mismatch fails CI with a clear remediation message.
  • expand_tools is excluded from the map: it is a meta-tool defined in src/mcp-server.ts:657-860 (not part of registerAllTools) and exists to discover other tools — listing it inside its own output is redundant.

Existing surface (verified)

  • src/tools/index.ts — explicit registry via registerAllTools(server) calling each registerXxxTool(server).
  • src/types/mcp.ts:47–54MCPToolDefinition with name, description, inputSchema (raw JSON Schema, not zod).
  • src/harness/flags.ts:43–46isPilotEnabled() controls pilot tool registration in src/index.ts:122–126.
  • scripts/ — existing build/utility scripts directory (e.g., scripts/lint-changed-src.js).
  • ts-node already in package.json devDeps.

How

  1. Generator scripts/gen-capability-map.ts:
    • Spin up an in-memory MCP server stub that calls registerAllTools twice: once with OPENCHROME_PILOT=0, once with =1. Diff the two registrations to mark pilot: entries.
    • For each tool: emit name, description, sorted parameter names with their JSON-Schema type string (e.g., url:string, recall?:boolean). Required vs. optional via the schema's required array.
    • Sort: category (alpha), then tool name (alpha). Deterministic byte output (LF line endings, single trailing newline).
    • Write docs/agent/capability-map.md. Header line states generation source: <!-- generated by scripts/gen-capability-map.ts from src/tools/index.ts — do not edit -->.
  2. Categorization: add an optional category?: string field to MCPToolDefinition. Default "misc". Categories: navigation, dom, interact, forms, js, tabs, storage, profile, lifecycle, observability, evidence, recording, pilot. Backfill in the same PR for the existing tools (one-line metadata, no logic change).
  3. npm script: "gen:capability-map": "ts-node scripts/gen-capability-map.ts".
  4. CI integration: extend the existing test pipeline (one of npm test, npm run lint:tier, or a new dedicated step in .github/workflows/). The step must run npm run gen:capability-map && git diff --exit-code docs/agent/capability-map.md. On failure, print: "capability-map drift — run \npm run gen:capability-map` and commit the result"`.
  5. Preamble doc docs/agent/README.md: 30-line example showing how to load the map as a system-prompt preamble with the Anthropic SDK, plus one paragraph on tool-tier filtering (pilot vs. core).

Out of scope

  • Shortening tool descriptions (separate cleanup PR).
  • AI-driven capability negotiation.
  • Per-session dynamic filtering.
  • Translating descriptions to other languages.

Acceptance criteria

  • scripts/gen-capability-map.ts is deterministic: running 5× produces byte-identical SHA-256.
  • Generated file size: target ≤ 4096 bytes, hard ceiling ≤ 6144 bytes. CI asserts wc -c <= 6144.
  • CI step is green on main/develop. Reviewer manually verifies by toggling a tool description and confirming CI fails.
  • Adding a new tool without regenerating fails CI with the documented remediation message.
  • Renaming a tool regenerates correctly: no stale entries.
  • expand_tools is not listed in the map; explicit exclusion is commented at the top of scripts/gen-capability-map.ts.
  • MCPToolDefinition.category backfilled for every existing tool.
  • No runtime src/ file imports the generator (grep -r 'gen-capability-map' src/ returns empty).
  • No new dependency in package.json.

Effort

S (~2 dev days).

Labels: enhancement, documentation, P2


Real verification (openchrome MCP, post-merge)

Build + generation

  • Fresh clone, npm install && npm run build && npm run gen:capability-map. Exit 0. File present at docs/agent/capability-map.md.
  • wc -c docs/agent/capability-map.md ≤ 6144.
  • for i in 1 2 3 4 5; do npm run gen:capability-map >/dev/null && sha256sum docs/agent/capability-map.md; done | sort -u | wc -l returns 1.

Pilot/core differentiation

  • With pilot tools (e.g., oc_pilot_* if registered) present, those lines carry the pilot: marker; core tools do not. Verified by grep -c '^- pilot:' docs/agent/capability-map.md equals the count from the generator's pilot diff.

Drift guard

  • In a throwaway branch, add a tool stub (or edit a description). Run npm run gen:capability-map. Diff appears. CI step fails with the documented remediation message. Revert; CI green.

Live MCP cross-check

  • Start MCP. Call mcp__openchrome__expand_tools (the meta-tool). Strip it from the runtime list (it must not appear in the map by design). For every remaining runtime tool name, assert it appears exactly once in docs/agent/capability-map.md:
    comm -3 <(grep -oP '^- \K[a-z_]+' docs/agent/capability-map.md | sort) \
            <(jq -r '.tools[].name' <runtime-listing> | grep -v '^expand_tools$' | sort)
    
    Output empty.
  • mcp__openchrome__oc_journal from any prior recorded session — every distinct tool_name value (except expand_tools) appears in the map.

Tier filtering sanity

  • Start MCP without --pilot. Runtime tool list excludes pilot tools. Map's pilot lines are still present (the map reflects all tools, marked). Documented behavior; CI test asserts.
  • Start MCP with --pilot. Runtime tool list includes pilot tools; map unchanged.

Token-cost measurement

  • Tokenize docs/agent/capability-map.md with the Anthropic tokenizer (or a reference like tiktoken for OpenAI cross-check). Record token count in PR description. Hard cap: ≤ 1500 tokens (Anthropic).

Smoke (informational only — not merge-blocking)

  • On a small fixed task list (5 tasks: read a page, fill a form, assert a side effect, take a screenshot, open a new tab), run an LLM with and without the preamble. Record which tool the model selects on the first call for each task. The PR description should report the delta. This is informational — there is no hard pass threshold here; the merge-blocking criterion is the byte-level deterministic generation, not LLM behavior.

Docs sanity

  • The 30-line preamble example in docs/agent/README.md is copy-pasteable and runs without modification against the Anthropic SDK (one-shot smoke).

OpenChrome 실검증 체크리스트

2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

검증 대상

최신 버전/공통 런타임 검증

  • 최신 develop 소스를 적용하고 npm run build 통과를 확인했다.
  • npm run lint:tier 통과를 확인했다.
  • npm test -- --runInBand 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
  • oc_connection_health가 connected 상태를 반환했다.
  • 로컬 fixture에서 OpenChrome navigate/read_page/interact/javascript_tool 경로로 DOM 상태 변화를 관찰했다.
  • 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

이슈별 해결 증거

  • 최신 develop에 연결된 구현 PR: 1195
  • 관련 테스트/소스 증거가 최신 트리에 존재한다:
    • src/mcp-server.ts
    • docs/roadmap/portability-harness-contract.md
    • src/pilot/handoff/manager.ts
    • src/tools/orchestration.ts
    • src/types/mcp.ts
    • tests/actions/action-cache.test.ts
  • 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

실패/보류 기준

  • 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
  • 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium prioritydocumentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions