Skip to content

docs(agent): auto-generated capability map for LLM preamble (#826)#927

Closed
shaun0927 wants to merge 26 commits into
developfrom
docs/826-capability-map-generator
Closed

docs(agent): auto-generated capability map for LLM preamble (#826)#927
shaun0927 wants to merge 26 commits into
developfrom
docs/826-capability-map-generator

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch docs/826-capability-map-generatordevelop
Draft no
CI ✅ all 10 checks passing
Mergeable ❌ CONFLICTING
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) gemini-code-assist: commented, chatgpt-codex-connector: commented
Head 819add5 — Refresh branch fixtures after s2c merge
Commits 6

Owner comment cleanup: 6 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Summary

Adds a build-time generator (scripts/gen-capability-map.ts) that introspects every tool registered in src/tools/index.ts and emits docs/agent/capability-map.md — a compact, drift-guarded preamble (~4.7 KB, well under the 6 KB cap) that MCP clients can prepend to their system prompt so the LLM picks the right tool first time.

Backfills a category?: string field on every existing tool definition (60+ tools). New CI workflow asserts no drift on every PR. expand_tools is explicitly excluded per design (it is a meta-tool defined in src/mcp-server.ts, not part of registerAllTools).

No runtime behavior change (build-time only). Zero new dependencies (ts-node was already in devDeps).

Closes #826

Generated file stats

  • Size: 4661 bytes (cap 6144) ✅
  • Tools listed: 60+; expand_tools excluded
  • Categories: navigation, dom, interact, forms, js, tabs, storage, profile, lifecycle, observability, evidence, recording, pilot, misc

Acceptance criteria (from #826)

  • scripts/gen-capability-map.ts is deterministic (test asserts byte-identical output on two consecutive runs)
  • File ≤ 6144 bytes (test asserts)
  • CI drift-check workflow at .github/workflows/capability-map.yml
  • expand_tools excluded with comment at top of generator
  • MCPToolDefinition.category backfilled on every existing tool
  • tests/scripts/gen-capability-map.test.ts (7 tests, all green)
  • No runtime src/ import of the generator
  • No new dependency in package.json (only gen:capability-map npm script added)

Verification

  • npm run build → exit 0
  • npm test -- tests/scripts/gen-capability-map.test.ts → 7/7 passed
  • npm run gen:capability-map && git diff --exit-code docs/agent/capability-map.md → green
  • wc -c docs/agent/capability-map.md → 4661 ≤ 6144
  • grep expand_tools docs/agent/capability-map.md → empty
  • grep -r 'gen-capability-map' src/ → empty (build-time only)

Portability-harness alignment

Build-time only — no runtime behavior change. All P1-P5 principles preserved trivially.

Post-merge verification

See the "Real verification" section in #826 for the full reviewer checklist (determinism, drift guard, live MCP cross-check, pilot/core differentiation, token-cost measurement).

Adds scripts/gen-capability-map.ts that introspects the tool registry and
emits docs/agent/capability-map.md as a compact, drift-guarded preamble
(~4.7 KB) MCP clients can prepend to their system prompt so the LLM
picks the right tool first time.

Backfills a `category` field on every existing MCPToolDefinition (60+
tools). New CI workflow asserts no drift on every PR. `expand_tools` is
explicitly excluded from the map per design (it is a meta-tool defined
in mcp-server.ts, not part of registerAllTools).

No runtime behavior change (build-time only). No new dependencies.

Closes #826
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automated "capability map" generator for openchrome MCP tools, creating a compact Markdown summary for agent system prompts. The changes include a new generation script, unit tests, and the addition of a "category" field to all tool definitions to support grouped documentation. Feedback recommends avoiding hardcoded metadata for pilot tools to prevent documentation drift, removing an unused import in the generator script, and updating a hypothetical model name in the documentation example to ensure it is functional.

Comment thread scripts/gen-capability-map.ts Outdated
Comment on lines +124 to +157
const PILOT_TOOLS: ToolDefinition[] = [
{
name: 'oc_pilot_handoff_create',
category: 'pilot',
description:
'Pilot-tier: mint a single-use handoff token that lets another agent ' +
'inherit the named browser session. In-memory only; process restart ' +
'drops every active handoff. Gated by --pilot + handoff_persist family.',
inputSchema: {
type: 'object',
properties: {
session_id: { type: 'string' },
ttl_seconds: { type: 'number' },
},
required: ['session_id'],
},
},
{
name: 'oc_pilot_handoff_redeem',
category: 'pilot',
description:
'Pilot-tier: redeem a single-use handoff token previously minted by ' +
'oc_pilot_handoff_create. Consumes the record on success — subsequent ' +
'calls with the same token return unknown_token. Gated by --pilot + ' +
'handoff_persist family.',
inputSchema: {
type: 'object',
properties: {
token: { type: 'string' },
},
required: ['token'],
},
},
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding the metadata for PILOT_TOOLS here introduces a maintenance risk and potential for documentation drift. Although the CI check ensures the generated file matches this script, it does not verify that this script's hardcoded data matches the actual tool definitions in src/pilot/handoff/tool.ts.

To ensure a single source of truth, consider exporting the tool definitions from src/pilot/handoff/tool.ts and importing them here, similar to how standard tools are processed via the registry.

Comment thread docs/agent/README.md Outdated
const client = new Anthropic();

const response = await client.messages.create({
model: 'claude-opus-4-5',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model name claude-opus-4-5 appears to be hypothetical. It is recommended to use a currently available model name, such as claude-3-5-sonnet-latest, to ensure the documentation is accurate and the example is immediately functional for users.

Suggested change
model: 'claude-opus-4-5',
model: 'claude-3-5-sonnet-latest',

Comment thread scripts/gen-capability-map.ts Outdated

import * as fs from 'fs';
import * as path from 'path';
import * as Module from 'module';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import of Module is unused in this script and can be safely removed to keep the code clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75a1e68f8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6 to +9
- 'src/tools/**'
- 'src/types/mcp.ts'
- 'scripts/gen-capability-map.ts'
- 'docs/agent/capability-map.md'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include pilot tool paths in drift-check trigger

The pull_request.paths filter only watches src/tools/**, src/types/mcp.ts, and generator/docs files, but scripts/gen-capability-map.ts embeds pilot tool metadata separately from src/pilot/handoff/tool.ts. If a PR changes pilot tool definitions only, this workflow will not run, so docs/agent/capability-map.md can drift silently despite the intended CI guard.

Useful? React with 👍 / 👎.

… example

Addresses gemini-code-assist review on PR #927:
- scripts/gen-capability-map.ts: drop unused `import * as Module from 'module';`
  Lint cleanup; no behavioral change (the symbol was never referenced).
- docs/agent/README.md: update the doc example from the hypothetical
  `claude-opus-4-5` to `claude-sonnet-4-5` so the snippet is immediately
  functional for users.

The PILOT_TOOLS drift concern (also raised by gemini) is deferred — it
deserves a dedicated mechanism (either a runtime drift test or making
the pilot/handoff tool definitions exportable) rather than the
hardcoded inline list this PR shipped with. Tracked as follow-up.

Refs #927

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

1 similar comment
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Resolve #927 against the current tool surface by keeping runtime descriptions intact, moving pilot handoff metadata into an importable definition module, and regenerating the agent preamble from those definitions.

Constraint: #927 must merge current develop and satisfy review feedback without changing tool behavior.
Rejected: keep pilot metadata hardcoded in the generator | it drifted from the runtime schema and missed scope/ttl_ms.
Rejected: leave the workflow path filter unchanged | pilot handoff definition edits would skip drift checks.
Confidence: high
Scope-risk: moderate
Directive: keep docs/agent/capability-map.md generated from runtime tool definitions and pilot definition exports only.
Tested: npm ci
Tested: npm run build
Tested: npx jest tests/scripts/gen-capability-map.test.ts tests/pilot/handoff/tool-audit.test.ts --runInBand
Tested: npm run gen:capability-map
Tested: npm run lint:tier
Tested: git diff --check
Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0331ad5380

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

};

const server = new StubMCPServer();
registerAllTools(server);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make capability-map generation independent of env flags

Calling registerAllTools(server) directly here makes the generated markdown depend on ambient env vars, because registerAllTools conditionally registers tools behind flags (for example OPENCHROME_PERF_INSIGHTS and proxy-hook gating in src/tools/index.ts). That means two developers (or local vs CI) can produce different docs/agent/capability-map.md from the same commit, causing flaky drift-check failures and inconsistent tool preambles. The generator should pin these flags to a deterministic value (or bypass gated registration paths explicitly) before collecting definitions.

Useful? React with 👍 / 👎.

Update the cross-environment Cursor smoke assertion to match the current develop Tier 1 tool surface that this capability-map PR now documents. The generator remains source-aligned; this commit only removes a stale CI count that blocked merge validation.

Constraint: CI observed 45 real Tier 1 tools plus expand_tools on the updated develop surface.

Rejected: Changing the generated capability map to hide tools | drift-check already passed and the failure was a stale test count.

Confidence: high

Scope-risk: narrow

Directive: Revisit this assertion whenever Tier 1 tool membership changes intentionally.

Tested: npx jest tests/cross-env/cursor-verification.test.ts --runInBand; npm run build; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

1 similar comment
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

shaun0927 and others added 2 commits May 13, 2026 05:16
Normalize CRLF line endings when comparing the committed console-capture baseline fixture so Windows checkout settings do not fail byte-for-byte regression tests. The generated response and capability-map generator remain unchanged.

Constraint: PR #927 Windows CI failed only on CRLF fixture comparison in console-capture-regression.test.ts.

Rejected: Regenerating the baseline fixture | the mismatch is platform line endings, not response shape drift.

Confidence: high

Scope-risk: narrow

Directive: Normalize committed text fixtures in tests that compare JSON.stringify output across OSes.

Tested: npx jest tests/tools/console-capture-regression.test.ts --runInBand; npm run build; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Merge latest develop into PR #927 while preserving the branch-specific cursor and console fixture expectations.

Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Replacing branch-specific fixture semantics with develop's exact snapshot | these PRs intentionally alter tool visibility or fixture normalization expectations.\nConfidence: high\nScope-risk: narrow\nDirective: Treat fixture-only conflict refreshes as semantic preservation work; do not rewrite feature behavior while unblocking mergeability.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a701e13ab6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread package.json Outdated
Comment on lines +40 to +44
<<<<<<< HEAD
"gen:capability-map": "ts-node scripts/gen-capability-map.ts"
=======
"harness:parallel-smoke": "ts-node tests/harness/parallel-smoke.ts"
>>>>>>> origin/develop
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Resolve merge conflict markers in package.json scripts

The committed package.json still contains Git conflict markers (<<<<<<<, =======, >>>>>>>), which makes the file invalid JSON and prevents any npm command from running (including CI steps like npm ci and npm run gen:capability-map). This is a release-blocking break because the repository cannot be installed or scripted until the conflict is resolved and both intended scripts are preserved.

Useful? React with 👍 / 👎.

Comment thread src/tools/read-page.ts
Comment on lines 1124 to 1126
export function registerReadPageTool(server: MCPServer): void {
server.registerTool('read_page', sanitizedHandler, definition);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-export readPageHandlerForReuse for returnAfterState

Removing the readPageHandlerForReuse export from read-page.ts breaks src/tools/_shared/return-after-state.ts, which imports and calls that symbol to implement returnAfterState snapshots. After this change, that import no longer resolves, so return-after-state chaining cannot compile/run until the handler is exported again or the caller is updated in the same commit.

Useful? React with 👍 / 👎.

Comment thread src/tools/read-page.ts Outdated
Comment on lines 222 to 225
if (mode !== 'ax' && mode !== 'dom' && mode !== 'css' && mode !== 'semantic') {
return {
content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", "semantic", or "markdown".` }],
content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", or "semantic".` }],
isError: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve read_page markdown mode in schema and dispatch

This change removes markdown from the accepted mode enum and runtime validation, so existing calls like read_page({ mode: "markdown" }) now return an invalid-mode error instead of content. That is a user-visible API regression introduced in a docs-focused commit and will break clients/workflows that rely on markdown extraction.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72ce3e5cc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts Outdated
(sanitizedResult._diagnostics as ReadPageDiagnostics).sanitizeMs = Date.now() - sanitizeStart;
}
return sanitizedResult;
return { ...result, content: sanitizedContent };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore read_page diagnostics output contract

Returning the sanitized result directly here drops the _diagnostics payload even when callers pass diagnostics: true, because the earlier timing collection/wrapping logic was removed in this commit. That is a user-visible API regression for clients that rely on timing telemetry (and it contradicts existing expectations in tests/tools/read-page-dom.test.ts and tests/tools/read-page.test.ts, which assert _diagnostics is present when requested). Please preserve the opt-in diagnostics metadata path instead of silently removing it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a051d0731e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6 to +10
- 'src/tools/**'
- 'src/types/mcp.ts'
- 'src/pilot/handoff/tool.ts'
- 'src/pilot/handoff/definitions.ts'
- 'scripts/gen-capability-map.ts'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand drift-check path filter to all generator inputs

Broaden the pull_request.paths filter because the generator output depends on files outside this list: scripts/gen-capability-map.ts calls registerAllTools() (in src/tools/index.ts), which conditionally registers tools via isSkillReplayEnabled/isProxyHookEnabled (src/harness/flags.ts) and isRunHarnessEnabled (src/run-harness/flags.ts). A PR that only changes those flag defaults/logic can change docs/agent/capability-map.md, but this workflow will not run, so drift can be merged undetected.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f48f835aae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pull_request:
branches: [develop, main]
paths:
- 'src/tools/**'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include run-harness tool inputs in path filter

Expand this pull_request.paths list to cover the run-harness tool sources (at least src/run-harness/tools.ts). The capability map generator calls registerAllTools() (src/tools/index.ts), which conditionally registers run-harness tools via isRunHarnessEnabled(), and that flag defaults to enabled when OPENCHROME_RUN_HARNESS is unset (src/run-harness/flags.ts). As written, a PR that changes run-harness tool definitions can change docs/agent/capability-map.md without triggering this drift-check workflow, so stale capability-map output can merge undetected.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffed3540ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +112 to +114
const server = new StubMCPServer();
registerAllTools(server);
return server.getDefinitions();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep capability-map generation side-effect free

Calling registerAllTools(server) here makes npm run gen:capability-map execute startup logic intended for real server boot (notably task-ledger orphan reaping via setTaskStartupReapPromise(getTaskStore().reapOrphans()) in src/tools/index.ts). That means a docs-generation command can mutate on-disk task state and emit environment-dependent behavior, which is risky for local developer data and undermines the expectation that this generator is a pure, deterministic read of tool metadata.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc3b0d339e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hints/hint-engine.ts Outdated
Comment on lines +174 to +177
getHint(toolName: string, result: Record<string, unknown>, isError: boolean, sessionId?: string): HintResult | null {
const resultText = this.extractText(result);
const hintSessionId = sessionId ?? 'default';
const recentCalls = this.activityTracker
.getRecentCalls(6, sessionId)
.filter((call) => {
if (currentCallId === undefined) return true;
const callId = (call as ToolCallEvent & { callId?: string }).id ?? (call as ToolCallEvent & { callId?: string }).callId;
return callId !== currentCallId;
})
.slice(0, 5);
const recentCalls = this.activityTracker.getRecentCalls(5, sessionId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore repeated-call hint detection in getHint

This change removes currentArgs/currentCallId from getHint and drops the RepeatedCallDetector pass, so identical no-progress tool retries no longer produce the repeated-identical-tool-call warning path. In practice, loops like repeatedly calling the same tool with the same args can now continue without the early anti-wandering hint (only broader progress-tracker rules remain), which regresses behavior expected by tests/e2e/scenarios/repeated-tool-loop.e2e.ts and tests/hints/repeated-call-detector.test.ts.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d6fdcca66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/fill-form.ts Outdated
Comment on lines +667 to +668
export function registerFillFormTool(server: MCPServer): void {
// Snapshot-cache (#879): bump docEpoch after every successful fill.
const sm = getSessionManager();
const wrapped = wrapMutatingHandler(handler, (sid, tid) =>
tid ? sm.getPage(sid, tid, undefined, 'fill_form') : Promise.resolve(null),
);
server.registerTool('fill_form', wrapped, definition);
server.registerTool('fill_form', handler, definition);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore fill_form cache invalidation wrapper

registerFillFormTool now registers handler directly, but the previous wrapMutatingHandler(...) call was the hook that marks the snapshot-cache frame dirty after successful mutations. With OPENCHROME_SNAPSHOT_CACHE=1, a successful fill_form can now leave the cache epoch unchanged, so follow-up read tools (e.g. read_page/find/query_dom) may return stale pre-fill state from cache in the same session.

Useful? React with 👍 / 👎.

Comment thread src/tools/drag-drop.ts
Comment on lines 54 to 57
type: 'number',
description: 'Delay in ms between steps. Default: 10',
},
intent: {
type: 'string',
description: 'Human-readable label for this action in audit logs (≤120 chars)',
maxLength: 120,
},
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve optional intent field in drag_drop schema

This change drops the intent parameter from drag_drop’s declared input schema, which is a user-visible API regression for clients that derive allowed arguments from tools/list definitions. The intent label contract (issue #894) is used for audit/journal/HITL context, and removing it here means typed clients will no longer send it for this tool even though workflows depend on that metadata.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9269f5961e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/types/mcp.ts
Comment on lines +143 to +144
/** Optional MCP-spec tool annotations. */
annotations?: ToolAnnotations;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep tool annotations required in MCPToolDefinition

Making annotations optional here removes the compile-time guard that every tool publishes readOnlyHint/destructiveHint metadata, and this same commit already drops annotations from multiple tool definitions (for example in src/tools/file-upload.ts, src/tools/fill-form.ts, src/tools/recording.ts, and src/run-harness/tools.ts). That causes tools/list to return mutating tools without safety hints, which breaks clients that rely on annotation-based consent/safety routing and contradicts the repository’s own annotation contract.

Useful? React with 👍 / 👎.

Comment thread src/tools/file-upload.ts Outdated
Comment on lines 63 to 64
description: 'Upload files to a file input element on the page.',
inputSchema: {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore intent field in file_upload tool contract

This change removes the intent argument from file_upload’s declared contract, so schema-driven clients will stop sending action intent labels for uploads. The tool still performs a mutating operation, but downstream audit/HITL traces lose the human intent context that callers previously supplied, creating a user-visible metadata regression for governance-oriented workflows.

Useful? React with 👍 / 👎.

Codex P1/P2: the capability-map generator PR (#826) is supposed to be
additive — it adds a 'category' field to MCPToolDefinition for the
auto-generated docs/agent/capability-map.md preamble. It must not
silently drop other user-visible contract fields.

This commit restores the dropped surface on 9 tool files:

intent (#915, ≤120 char audit label):
  drag_drop, file_upload, fill_form, form_input
  - input schema entry, description note, and the handler-side
    INVALID_INTENT validation block

annotations (#867, MCP readOnlyHint/destructiveHint metadata):
  drag_drop, file_upload, fill_form, form_input, interact,
  request_intercept, tabs_close,
  oc_profile_status,
  oc_recording_start, oc_recording_stop, oc_recording_status,
  oc_recording_list, oc_recording_export

Method: rebased each affected file to develop's known-good state, then
re-applied PR #927's only PR-specific addition (category: '<group>' on
the definition). The capability-map generator script, docs, and CI
workflow are untouched.

oc_recording_status also gains the category: 'recording' tag that the
PR forgot to add when it removed the previous annotations field, so
the lint:tool-categories sweep stays consistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927
Copy link
Copy Markdown
Owner Author

Forthcoming review note: c7e0c35 restores intent (#915 contract, ≤120 char audit label) on drag_drop / file_upload / fill_form / form_input and annotations: TOOL_ANNOTATIONS.* on 13 tool definitions (interact, drag_drop, file_upload, fill_form, form_input, request_intercept, tabs_close, oc_profile_status, oc_recording_start/stop/status/list/export). Method: rebased each affected file to develop's known-good state, then re-applied PR #927's only PR-specific addition — the category: '<group>' field on each definition. The capability-map generator (scripts/gen-capability-map.ts), the generated docs/agent/capability-map.md, the README, and the CI workflow are untouched. oc_recording_status also gains the category: 'recording' tag that the PR previously forgot when removing its annotations field, keeping lint:tool-categories consistent. Re-running CI once develop CI signal is restored (PR #1105).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7e0c35ddd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3 to +4
export const createDefinition: MCPToolDefinition = {
name: 'oc_pilot_handoff_create',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore annotations on pilot handoff tool definitions

Both handoff tool definitions were moved into definitions.ts without annotations, and registerOcPilotHandoffTool now registers those objects directly; as a result, tools/list will expose oc_pilot_handoff_create/oc_pilot_handoff_redeem without readOnlyHint/destructiveHint, which breaks clients that rely on annotation metadata for safety or consent routing. Fresh evidence in this commit: the newly added src/pilot/handoff/definitions.ts file defines both tools with no annotations field.

Useful? React with 👍 / 👎.

# Conflicts:
#	src/tools/interact.ts
@shaun0927
Copy link
Copy Markdown
Owner Author

Develop CI restored, but this PR still has PR-specific test failures

PR #1105 has merged to develop and the develop CI baseline is now green (run 25808614236 was 9/9 on the test-hardened commit). After updating this branch via update-branch, current CI status is pass=0 fail=8 — the failures are PR-specific (not inherited from develop) and need author-driven debugging.

Addressed in this session

Author follow-up

  • Inspect this PR's failed jobs in the latest CI run; the failing suites are not inherited from develop. Common patterns I observed across the Group A PRs after the develop rebase: TaskJournal/journal summary regressions, snapshot drift in registration-default.snapshot.test.ts, and PR-feature test files (extract-data-modes, output-handles, etc.).
  • No outstanding Codex P0/P1/P2 on the latest commit; the PR is otherwise content-clean and ready for a final review once the PR-feature tests pass against the updated develop baseline.

The reverse-direction `tool-annotations.test.ts` orphan check requires
every TOOL_ANNOTATIONS entry to be referenced from src code. PR #927
factored the pilot handoff definitions out into `pilot/handoff/definitions.ts`
without carrying over the annotations, and the run-harness tool defs
were never annotated to begin with. Add the missing annotations field
to all six definitions:

- oc_pilot_handoff_create, oc_pilot_handoff_redeem (pilot/handoff/definitions.ts)
- oc_run_start, oc_run_status, oc_run_events, oc_run_finish (run-harness/tools.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbadcf5a34

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +123 to +126
const { PILOT_HANDOFF_TOOL_DEFINITIONS } = require('../src/pilot/handoff/definitions') as {
PILOT_HANDOFF_TOOL_DEFINITIONS: readonly ToolDefinition[];
};
return [...PILOT_HANDOFF_TOOL_DEFINITIONS];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Build capability map from registered pilot tools

collectPilotTools() unconditionally injects oc_pilot_handoff_* definitions into the generated map even though the normal server bootstrap path only registers tools via registerAllTools (src/index.ts:114,440) and bootstrapPilot() merely imports modules without registering handoff tools (src/harness/flags.ts:184-190). This makes docs/agent/capability-map.md advertise tools that are absent from tools/list in real runs, so agents using the preamble can attempt non-existent tool calls and fail with unknown-tool errors.

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Owner Author

Reviewed for merge readiness: not safe to merge in the current PR shape.

The intended goal—generating an agent capability map—is valid, but this branch is not a valid merge unit right now. It is DIRTY against develop, has a failing drift-check, and the bot review found broad regressions outside the documentation/generator scope, including unresolved conflict markers in package.json, read_page contract regressions, lost diagnostics/markdown behavior, missing tool annotations, cache invalidation regressions, and capability-map inputs that do not consistently track the real registered tool surface.

Because these issues affect core runtime/tool contracts, this should not be repaired by small follow-up commits on the current branch. The safe path is to rebuild/re-scope the PR so it only contains the generator/workflow/docs changes, then re-run drift generation and CI from current develop.

Decision: not merge-ready; leaving unmerged.

@shaun0927
Copy link
Copy Markdown
Owner Author

Closing as superseded by the fresh, narrow current-develop implementation in #1195. #927 is stale/conflicting and its drift-check is failing, while #1195 preserves the #826 scope with a smaller generated artifact + generator tests.

@shaun0927 shaun0927 closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant