docs(agent): auto-generated capability map for LLM preamble (#826)#927
docs(agent): auto-generated capability map for LLM preamble (#826)#927shaun0927 wants to merge 26 commits into
Conversation
Adds scripts/gen-capability-map.ts that introspects the tool registry and emits docs/agent/capability-map.md as a compact, drift-guarded preamble (~4.7 KB) MCP clients can prepend to their system prompt so the LLM picks the right tool first time. Backfills a `category` field on every existing MCPToolDefinition (60+ tools). New CI workflow asserts no drift on every PR. `expand_tools` is explicitly excluded from the map per design (it is a meta-tool defined in mcp-server.ts, not part of registerAllTools). No runtime behavior change (build-time only). No new dependencies. Closes #826
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Code Review
This pull request introduces an automated "capability map" generator for openchrome MCP tools, creating a compact Markdown summary for agent system prompts. The changes include a new generation script, unit tests, and the addition of a "category" field to all tool definitions to support grouped documentation. Feedback recommends avoiding hardcoded metadata for pilot tools to prevent documentation drift, removing an unused import in the generator script, and updating a hypothetical model name in the documentation example to ensure it is functional.
| const PILOT_TOOLS: ToolDefinition[] = [ | ||
| { | ||
| name: 'oc_pilot_handoff_create', | ||
| category: 'pilot', | ||
| description: | ||
| 'Pilot-tier: mint a single-use handoff token that lets another agent ' + | ||
| 'inherit the named browser session. In-memory only; process restart ' + | ||
| 'drops every active handoff. Gated by --pilot + handoff_persist family.', | ||
| inputSchema: { | ||
| type: 'object', | ||
| properties: { | ||
| session_id: { type: 'string' }, | ||
| ttl_seconds: { type: 'number' }, | ||
| }, | ||
| required: ['session_id'], | ||
| }, | ||
| }, | ||
| { | ||
| name: 'oc_pilot_handoff_redeem', | ||
| category: 'pilot', | ||
| description: | ||
| 'Pilot-tier: redeem a single-use handoff token previously minted by ' + | ||
| 'oc_pilot_handoff_create. Consumes the record on success — subsequent ' + | ||
| 'calls with the same token return unknown_token. Gated by --pilot + ' + | ||
| 'handoff_persist family.', | ||
| inputSchema: { | ||
| type: 'object', | ||
| properties: { | ||
| token: { type: 'string' }, | ||
| }, | ||
| required: ['token'], | ||
| }, | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
Hardcoding the metadata for PILOT_TOOLS here introduces a maintenance risk and potential for documentation drift. Although the CI check ensures the generated file matches this script, it does not verify that this script's hardcoded data matches the actual tool definitions in src/pilot/handoff/tool.ts.
To ensure a single source of truth, consider exporting the tool definitions from src/pilot/handoff/tool.ts and importing them here, similar to how standard tools are processed via the registry.
| const client = new Anthropic(); | ||
|
|
||
| const response = await client.messages.create({ | ||
| model: 'claude-opus-4-5', |
There was a problem hiding this comment.
The model name claude-opus-4-5 appears to be hypothetical. It is recommended to use a currently available model name, such as claude-3-5-sonnet-latest, to ensure the documentation is accurate and the example is immediately functional for users.
| model: 'claude-opus-4-5', | |
| model: 'claude-3-5-sonnet-latest', |
|
|
||
| import * as fs from 'fs'; | ||
| import * as path from 'path'; | ||
| import * as Module from 'module'; |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75a1e68f8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - 'src/tools/**' | ||
| - 'src/types/mcp.ts' | ||
| - 'scripts/gen-capability-map.ts' | ||
| - 'docs/agent/capability-map.md' |
There was a problem hiding this comment.
Include pilot tool paths in drift-check trigger
The pull_request.paths filter only watches src/tools/**, src/types/mcp.ts, and generator/docs files, but scripts/gen-capability-map.ts embeds pilot tool metadata separately from src/pilot/handoff/tool.ts. If a PR changes pilot tool definitions only, this workflow will not run, so docs/agent/capability-map.md can drift silently despite the intended CI guard.
Useful? React with 👍 / 👎.
… example Addresses gemini-code-assist review on PR #927: - scripts/gen-capability-map.ts: drop unused `import * as Module from 'module';` Lint cleanup; no behavioral change (the symbol was never referenced). - docs/agent/README.md: update the doc example from the hypothetical `claude-opus-4-5` to `claude-sonnet-4-5` so the snippet is immediately functional for users. The PILOT_TOOLS drift concern (also raised by gemini) is deferred — it deserves a dedicated mechanism (either a runtime drift test or making the pilot/handoff tool definitions exportable) rather than the hardcoded inline list this PR shipped with. Tracked as follow-up. Refs #927 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
1 similar comment
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Resolve #927 against the current tool surface by keeping runtime descriptions intact, moving pilot handoff metadata into an importable definition module, and regenerating the agent preamble from those definitions. Constraint: #927 must merge current develop and satisfy review feedback without changing tool behavior. Rejected: keep pilot metadata hardcoded in the generator | it drifted from the runtime schema and missed scope/ttl_ms. Rejected: leave the workflow path filter unchanged | pilot handoff definition edits would skip drift checks. Confidence: high Scope-risk: moderate Directive: keep docs/agent/capability-map.md generated from runtime tool definitions and pilot definition exports only. Tested: npm ci Tested: npm run build Tested: npx jest tests/scripts/gen-capability-map.test.ts tests/pilot/handoff/tool-audit.test.ts --runInBand Tested: npm run gen:capability-map Tested: npm run lint:tier Tested: git diff --check Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0331ad5380
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }; | ||
|
|
||
| const server = new StubMCPServer(); | ||
| registerAllTools(server); |
There was a problem hiding this comment.
Make capability-map generation independent of env flags
Calling registerAllTools(server) directly here makes the generated markdown depend on ambient env vars, because registerAllTools conditionally registers tools behind flags (for example OPENCHROME_PERF_INSIGHTS and proxy-hook gating in src/tools/index.ts). That means two developers (or local vs CI) can produce different docs/agent/capability-map.md from the same commit, causing flaky drift-check failures and inconsistent tool preambles. The generator should pin these flags to a deterministic value (or bypass gated registration paths explicitly) before collecting definitions.
Useful? React with 👍 / 👎.
Update the cross-environment Cursor smoke assertion to match the current develop Tier 1 tool surface that this capability-map PR now documents. The generator remains source-aligned; this commit only removes a stale CI count that blocked merge validation. Constraint: CI observed 45 real Tier 1 tools plus expand_tools on the updated develop surface. Rejected: Changing the generated capability map to hide tools | drift-check already passed and the failure was a stale test count. Confidence: high Scope-risk: narrow Directive: Revisit this assertion whenever Tier 1 tool membership changes intentionally. Tested: npx jest tests/cross-env/cursor-verification.test.ts --runInBand; npm run build; npm run lint:tier; git diff --check Not-tested: full CI matrix pending on GitHub. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
1 similar comment
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Normalize CRLF line endings when comparing the committed console-capture baseline fixture so Windows checkout settings do not fail byte-for-byte regression tests. The generated response and capability-map generator remain unchanged. Constraint: PR #927 Windows CI failed only on CRLF fixture comparison in console-capture-regression.test.ts. Rejected: Regenerating the baseline fixture | the mismatch is platform line endings, not response shape drift. Confidence: high Scope-risk: narrow Directive: Normalize committed text fixtures in tests that compare JSON.stringify output across OSes. Tested: npx jest tests/tools/console-capture-regression.test.ts --runInBand; npm run build; npm run lint:tier; git diff --check Not-tested: full CI matrix pending on GitHub. Co-authored-by: OmX <omx@oh-my-codex.dev>
Merge latest develop into PR #927 while preserving the branch-specific cursor and console fixture expectations. Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Replacing branch-specific fixture semantics with develop's exact snapshot | these PRs intentionally alter tool visibility or fixture normalization expectations.\nConfidence: high\nScope-risk: narrow\nDirective: Treat fixture-only conflict refreshes as semantic preservation work; do not rewrite feature behavior while unblocking mergeability.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a701e13ab6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| <<<<<<< HEAD | ||
| "gen:capability-map": "ts-node scripts/gen-capability-map.ts" | ||
| ======= | ||
| "harness:parallel-smoke": "ts-node tests/harness/parallel-smoke.ts" | ||
| >>>>>>> origin/develop |
There was a problem hiding this comment.
Resolve merge conflict markers in package.json scripts
The committed package.json still contains Git conflict markers (<<<<<<<, =======, >>>>>>>), which makes the file invalid JSON and prevents any npm command from running (including CI steps like npm ci and npm run gen:capability-map). This is a release-blocking break because the repository cannot be installed or scripted until the conflict is resolved and both intended scripts are preserved.
Useful? React with 👍 / 👎.
| export function registerReadPageTool(server: MCPServer): void { | ||
| server.registerTool('read_page', sanitizedHandler, definition); | ||
| } |
There was a problem hiding this comment.
Re-export readPageHandlerForReuse for returnAfterState
Removing the readPageHandlerForReuse export from read-page.ts breaks src/tools/_shared/return-after-state.ts, which imports and calls that symbol to implement returnAfterState snapshots. After this change, that import no longer resolves, so return-after-state chaining cannot compile/run until the handler is exported again or the caller is updated in the same commit.
Useful? React with 👍 / 👎.
| if (mode !== 'ax' && mode !== 'dom' && mode !== 'css' && mode !== 'semantic') { | ||
| return { | ||
| content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", "semantic", or "markdown".` }], | ||
| content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", or "semantic".` }], | ||
| isError: true, |
There was a problem hiding this comment.
Preserve read_page markdown mode in schema and dispatch
This change removes markdown from the accepted mode enum and runtime validation, so existing calls like read_page({ mode: "markdown" }) now return an invalid-mode error instead of content. That is a user-visible API regression introduced in a docs-focused commit and will break clients/workflows that rely on markdown extraction.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72ce3e5cc5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| (sanitizedResult._diagnostics as ReadPageDiagnostics).sanitizeMs = Date.now() - sanitizeStart; | ||
| } | ||
| return sanitizedResult; | ||
| return { ...result, content: sanitizedContent }; |
There was a problem hiding this comment.
Restore read_page diagnostics output contract
Returning the sanitized result directly here drops the _diagnostics payload even when callers pass diagnostics: true, because the earlier timing collection/wrapping logic was removed in this commit. That is a user-visible API regression for clients that rely on timing telemetry (and it contradicts existing expectations in tests/tools/read-page-dom.test.ts and tests/tools/read-page.test.ts, which assert _diagnostics is present when requested). Please preserve the opt-in diagnostics metadata path instead of silently removing it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a051d0731e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - 'src/tools/**' | ||
| - 'src/types/mcp.ts' | ||
| - 'src/pilot/handoff/tool.ts' | ||
| - 'src/pilot/handoff/definitions.ts' | ||
| - 'scripts/gen-capability-map.ts' |
There was a problem hiding this comment.
Expand drift-check path filter to all generator inputs
Broaden the pull_request.paths filter because the generator output depends on files outside this list: scripts/gen-capability-map.ts calls registerAllTools() (in src/tools/index.ts), which conditionally registers tools via isSkillReplayEnabled/isProxyHookEnabled (src/harness/flags.ts) and isRunHarnessEnabled (src/run-harness/flags.ts). A PR that only changes those flag defaults/logic can change docs/agent/capability-map.md, but this workflow will not run, so drift can be merged undetected.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f48f835aae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pull_request: | ||
| branches: [develop, main] | ||
| paths: | ||
| - 'src/tools/**' |
There was a problem hiding this comment.
Include run-harness tool inputs in path filter
Expand this pull_request.paths list to cover the run-harness tool sources (at least src/run-harness/tools.ts). The capability map generator calls registerAllTools() (src/tools/index.ts), which conditionally registers run-harness tools via isRunHarnessEnabled(), and that flag defaults to enabled when OPENCHROME_RUN_HARNESS is unset (src/run-harness/flags.ts). As written, a PR that changes run-harness tool definitions can change docs/agent/capability-map.md without triggering this drift-check workflow, so stale capability-map output can merge undetected.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffed3540ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const server = new StubMCPServer(); | ||
| registerAllTools(server); | ||
| return server.getDefinitions(); |
There was a problem hiding this comment.
Keep capability-map generation side-effect free
Calling registerAllTools(server) here makes npm run gen:capability-map execute startup logic intended for real server boot (notably task-ledger orphan reaping via setTaskStartupReapPromise(getTaskStore().reapOrphans()) in src/tools/index.ts). That means a docs-generation command can mutate on-disk task state and emit environment-dependent behavior, which is risky for local developer data and undermines the expectation that this generator is a pure, deterministic read of tool metadata.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc3b0d339e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| getHint(toolName: string, result: Record<string, unknown>, isError: boolean, sessionId?: string): HintResult | null { | ||
| const resultText = this.extractText(result); | ||
| const hintSessionId = sessionId ?? 'default'; | ||
| const recentCalls = this.activityTracker | ||
| .getRecentCalls(6, sessionId) | ||
| .filter((call) => { | ||
| if (currentCallId === undefined) return true; | ||
| const callId = (call as ToolCallEvent & { callId?: string }).id ?? (call as ToolCallEvent & { callId?: string }).callId; | ||
| return callId !== currentCallId; | ||
| }) | ||
| .slice(0, 5); | ||
| const recentCalls = this.activityTracker.getRecentCalls(5, sessionId); |
There was a problem hiding this comment.
Restore repeated-call hint detection in getHint
This change removes currentArgs/currentCallId from getHint and drops the RepeatedCallDetector pass, so identical no-progress tool retries no longer produce the repeated-identical-tool-call warning path. In practice, loops like repeatedly calling the same tool with the same args can now continue without the early anti-wandering hint (only broader progress-tracker rules remain), which regresses behavior expected by tests/e2e/scenarios/repeated-tool-loop.e2e.ts and tests/hints/repeated-call-detector.test.ts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d6fdcca66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export function registerFillFormTool(server: MCPServer): void { | ||
| // Snapshot-cache (#879): bump docEpoch after every successful fill. | ||
| const sm = getSessionManager(); | ||
| const wrapped = wrapMutatingHandler(handler, (sid, tid) => | ||
| tid ? sm.getPage(sid, tid, undefined, 'fill_form') : Promise.resolve(null), | ||
| ); | ||
| server.registerTool('fill_form', wrapped, definition); | ||
| server.registerTool('fill_form', handler, definition); |
There was a problem hiding this comment.
Restore fill_form cache invalidation wrapper
registerFillFormTool now registers handler directly, but the previous wrapMutatingHandler(...) call was the hook that marks the snapshot-cache frame dirty after successful mutations. With OPENCHROME_SNAPSHOT_CACHE=1, a successful fill_form can now leave the cache epoch unchanged, so follow-up read tools (e.g. read_page/find/query_dom) may return stale pre-fill state from cache in the same session.
Useful? React with 👍 / 👎.
| type: 'number', | ||
| description: 'Delay in ms between steps. Default: 10', | ||
| }, | ||
| intent: { | ||
| type: 'string', | ||
| description: 'Human-readable label for this action in audit logs (≤120 chars)', | ||
| maxLength: 120, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Preserve optional intent field in drag_drop schema
This change drops the intent parameter from drag_drop’s declared input schema, which is a user-visible API regression for clients that derive allowed arguments from tools/list definitions. The intent label contract (issue #894) is used for audit/journal/HITL context, and removing it here means typed clients will no longer send it for this tool even though workflows depend on that metadata.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9269f5961e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /** Optional MCP-spec tool annotations. */ | ||
| annotations?: ToolAnnotations; |
There was a problem hiding this comment.
Keep tool annotations required in MCPToolDefinition
Making annotations optional here removes the compile-time guard that every tool publishes readOnlyHint/destructiveHint metadata, and this same commit already drops annotations from multiple tool definitions (for example in src/tools/file-upload.ts, src/tools/fill-form.ts, src/tools/recording.ts, and src/run-harness/tools.ts). That causes tools/list to return mutating tools without safety hints, which breaks clients that rely on annotation-based consent/safety routing and contradicts the repository’s own annotation contract.
Useful? React with 👍 / 👎.
| description: 'Upload files to a file input element on the page.', | ||
| inputSchema: { |
There was a problem hiding this comment.
Restore intent field in file_upload tool contract
This change removes the intent argument from file_upload’s declared contract, so schema-driven clients will stop sending action intent labels for uploads. The tool still performs a mutating operation, but downstream audit/HITL traces lose the human intent context that callers previously supplied, creating a user-visible metadata regression for governance-oriented workflows.
Useful? React with 👍 / 👎.
Codex P1/P2: the capability-map generator PR (#826) is supposed to be additive — it adds a 'category' field to MCPToolDefinition for the auto-generated docs/agent/capability-map.md preamble. It must not silently drop other user-visible contract fields. This commit restores the dropped surface on 9 tool files: intent (#915, ≤120 char audit label): drag_drop, file_upload, fill_form, form_input - input schema entry, description note, and the handler-side INVALID_INTENT validation block annotations (#867, MCP readOnlyHint/destructiveHint metadata): drag_drop, file_upload, fill_form, form_input, interact, request_intercept, tabs_close, oc_profile_status, oc_recording_start, oc_recording_stop, oc_recording_status, oc_recording_list, oc_recording_export Method: rebased each affected file to develop's known-good state, then re-applied PR #927's only PR-specific addition (category: '<group>' on the definition). The capability-map generator script, docs, and CI workflow are untouched. oc_recording_status also gains the category: 'recording' tag that the PR forgot to add when it removed the previous annotations field, so the lint:tool-categories sweep stays consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Forthcoming review note: c7e0c35 restores |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7e0c35ddd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export const createDefinition: MCPToolDefinition = { | ||
| name: 'oc_pilot_handoff_create', |
There was a problem hiding this comment.
Restore annotations on pilot handoff tool definitions
Both handoff tool definitions were moved into definitions.ts without annotations, and registerOcPilotHandoffTool now registers those objects directly; as a result, tools/list will expose oc_pilot_handoff_create/oc_pilot_handoff_redeem without readOnlyHint/destructiveHint, which breaks clients that rely on annotation metadata for safety or consent routing. Fresh evidence in this commit: the newly added src/pilot/handoff/definitions.ts file defines both tools with no annotations field.
Useful? React with 👍 / 👎.
# Conflicts: # src/tools/interact.ts
Develop CI restored, but this PR still has PR-specific test failuresPR #1105 has merged to Addressed in this session
Author follow-up
|
The reverse-direction `tool-annotations.test.ts` orphan check requires every TOOL_ANNOTATIONS entry to be referenced from src code. PR #927 factored the pilot handoff definitions out into `pilot/handoff/definitions.ts` without carrying over the annotations, and the run-harness tool defs were never annotated to begin with. Add the missing annotations field to all six definitions: - oc_pilot_handoff_create, oc_pilot_handoff_redeem (pilot/handoff/definitions.ts) - oc_run_start, oc_run_status, oc_run_events, oc_run_finish (run-harness/tools.ts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbadcf5a34
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const { PILOT_HANDOFF_TOOL_DEFINITIONS } = require('../src/pilot/handoff/definitions') as { | ||
| PILOT_HANDOFF_TOOL_DEFINITIONS: readonly ToolDefinition[]; | ||
| }; | ||
| return [...PILOT_HANDOFF_TOOL_DEFINITIONS]; |
There was a problem hiding this comment.
Build capability map from registered pilot tools
collectPilotTools() unconditionally injects oc_pilot_handoff_* definitions into the generated map even though the normal server bootstrap path only registers tools via registerAllTools (src/index.ts:114,440) and bootstrapPilot() merely imports modules without registering handoff tools (src/harness/flags.ts:184-190). This makes docs/agent/capability-map.md advertise tools that are absent from tools/list in real runs, so agents using the preamble can attempt non-existent tool calls and fail with unknown-tool errors.
Useful? React with 👍 / 👎.
|
Reviewed for merge readiness: not safe to merge in the current PR shape. The intended goal—generating an agent capability map—is valid, but this branch is not a valid merge unit right now. It is DIRTY against Because these issues affect core runtime/tool contracts, this should not be repaired by small follow-up commits on the current branch. The safe path is to rebuild/re-scope the PR so it only contains the generator/workflow/docs changes, then re-run drift generation and CI from current Decision: not merge-ready; leaving unmerged. |
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
docs/826-capability-map-generator→develop819add5— Refresh branch fixtures after s2c mergeOwner comment cleanup: 6 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary
Adds a build-time generator (
scripts/gen-capability-map.ts) that introspects every tool registered insrc/tools/index.tsand emitsdocs/agent/capability-map.md— a compact, drift-guarded preamble (~4.7 KB, well under the 6 KB cap) that MCP clients can prepend to their system prompt so the LLM picks the right tool first time.Backfills a
category?: stringfield on every existing tool definition (60+ tools). New CI workflow asserts no drift on every PR.expand_toolsis explicitly excluded per design (it is a meta-tool defined insrc/mcp-server.ts, not part ofregisterAllTools).No runtime behavior change (build-time only). Zero new dependencies (
ts-nodewas already in devDeps).Closes #826
Generated file stats
expand_toolsexcludedAcceptance criteria (from #826)
scripts/gen-capability-map.tsis deterministic (test asserts byte-identical output on two consecutive runs).github/workflows/capability-map.ymlexpand_toolsexcluded with comment at top of generatorMCPToolDefinition.categorybackfilled on every existing tooltests/scripts/gen-capability-map.test.ts(7 tests, all green)src/import of the generatorpackage.json(onlygen:capability-mapnpm script added)Verification
npm run build→ exit 0npm test -- tests/scripts/gen-capability-map.test.ts→ 7/7 passednpm run gen:capability-map && git diff --exit-code docs/agent/capability-map.md→ greenwc -c docs/agent/capability-map.md→ 4661 ≤ 6144grep expand_tools docs/agent/capability-map.md→ emptygrep -r 'gen-capability-map' src/→ empty (build-time only)Portability-harness alignment
Build-time only — no runtime behavior change. All P1-P5 principles preserved trivially.
Post-merge verification
See the "Real verification" section in #826 for the full reviewer checklist (determinism, drift guard, live MCP cross-check, pilot/core differentiation, token-cost measurement).