feat(core): optional intent label on interaction tools (#894)#915
Conversation
Adds an optional intent?: string (≤120 chars) to interact, form_input,
fill_form, drag_drop, and file_upload. When provided, intent is threaded
to:
- oc_journal recent entries: generateSummary appends [intent: "..."]
- Ralph S7 HITL escalation: HitlContext gains an optional intent field
rendered as an "Intent:" line by formatHitlResponse
When omitted, no key/line appears (byte-identical to v1.11.0 audit trail).
Empty string and >120 chars return INVALID_INTENT with no DOM action.
Deferred to a follow-up (out of scope per the spec):
- Trace JSONL top-level intent field — requires new tool_call trace
events in the MCP server wrapper which do not currently exist; will
file as Phase 2 follow-up issue.
- oc_evidence_bundle integration — EvidenceBundleSnapshot has no
context object today; a separate schema change is needed.
Tests: 11 new tests (journal summary, HITL plumbing, schema invariants)
all pass; full suite shows no regressions beyond develop baseline flakes.
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Code Review
This pull request introduces an optional intent parameter to several interaction tools, including drag_drop, file_upload, fill_form, form_input, and interact, allowing callers to provide human-readable labels for audit logs. The changes include schema updates, validation logic, and integration with the task journal and HITL escalation responses. Feedback suggests enhancing the interact tool's summary by incorporating the query parameter and refactoring the duplicated validation logic into a shared utility function with more robust type checking to prevent potential runtime errors.
| case 'interact': return `${s} Click "${args.description || args.selector || ''}"`; | ||
| case 'navigate': base = `${s} → ${args.url || 'unknown'}`; break; | ||
| case 'read_page': base = `${s} Read page`; break; | ||
| case 'interact': base = `${s} Click "${args.description || args.selector || ''}"`; break; |
There was a problem hiding this comment.
The interact tool uses the query parameter for its natural language input, but the summary generator here is looking for description or selector. This will result in an empty string in the journal summary for interact actions. Consider adding args.query to the fallback chain.
| case 'interact': base = `${s} Click "${args.description || args.selector || ''}"`; break; | |
| case 'interact': base = `${s} Click "${args.query || args.description || args.selector || ''}"`; break; |
| if (intent !== undefined) { | ||
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.
| if (intent !== undefined) { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } | |
| if (typeof intent === 'string') { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } |
| if (intent !== undefined) { | ||
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.
| if (intent !== undefined) { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } | |
| if (typeof intent === 'string') { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } |
| if (intent !== undefined) { | ||
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.
| if (intent !== undefined) { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } | |
| if (typeof intent === 'string') { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } |
| if (intent !== undefined) { | ||
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.
| if (intent !== undefined) { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } | |
| if (typeof intent === 'string') { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } |
| if (intent !== undefined) { | ||
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | ||
| isError: true, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.
| if (intent !== undefined) { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } | |
| if (typeof intent === 'string') { | |
| if (intent === '') { | |
| return { | |
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | |
| isError: true, | |
| }; | |
| } | |
| if (intent.length > 120) { | |
| return { | |
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], | |
| isError: true, | |
| }; | |
| } | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d61ad0fa2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { |
There was a problem hiding this comment.
Guard
intent type before reading .length
The new validation assumes intent is a string when it is merely !== undefined; if a client sends intent: null (or another non-string object), intent.length throws and the tool returns a generic internal error instead of a deterministic INVALID_INTENT response. This is reachable because the server-side preflight in mcp-server only enforces required fields, not runtime types, so this branch should explicitly check typeof intent === 'string' (or reject non-strings) before using .length. The same pattern appears across the other updated interaction tools as well.
Useful? React with 👍 / 👎.
# Conflicts: # src/tools/fill-form.ts # src/tools/form-input.ts # src/tools/interact.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c23126fad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], |
There was a problem hiding this comment.
Validate intent type before checking
.length
drag_drop treats any defined intent as a string and immediately reads intent.length; with the current MCP preflight only enforcing required fields, a call like intent: null reaches this branch and throws before returning the intended INVALID_INTENT error. Fresh evidence versus the prior interact thread: the same unchecked pattern is now present here (and mirrored in the other newly updated interaction tools), so non-string payloads can still produce internal errors instead of deterministic validation failures.
Useful? React with 👍 / 👎.
The develop merge expanded tier-one tools and added isolated-context fields to tab creation, but the shared tests/mocks still asserted the older surface. Updating the mock context accessor and CI assertions keeps this PR focused while allowing the merged develop baseline to pass. Constraint: Fix only post-merge CI drift shared by the reviewed PRs. Rejected: Changing tabs_context production behavior | the failure was a stale mock, not a runtime regression. Confidence: high Scope-risk: narrow Directive: Keep Cursor tool-count assertions synchronized with tier configuration changes. Tested: npm test -- --runInBand tests/cross-env/cursor-verification.test.ts tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9605a89658
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { |
There was a problem hiding this comment.
Guard intent type before reading
.length in form_input
This branch assumes intent is a string, but server preflight only checks required fields (src/mcp-server.ts required-arg gate) and does not enforce runtime types, so a payload like intent: null reaches intent.length and throws a TypeError instead of returning a deterministic INVALID_INTENT result. That converts a user validation error into an internal failure path for this tool.
Useful? React with 👍 / 👎.
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { |
There was a problem hiding this comment.
Guard intent type before reading
.length in fill_form
The new validation path treats any defined intent as string-like; with current call handling, non-string values can still reach this handler, and intent: null will throw at intent.length rather than returning INVALID_INTENT. This means malformed client input can trigger an internal error path instead of the expected validation response for fill_form.
Useful? React with 👍 / 👎.
| isError: true, | ||
| }; | ||
| } | ||
| if (intent.length > 120) { |
There was a problem hiding this comment.
Guard intent type before reading
.length in file_upload
Here intent is cast and then dereferenced as if it were always a string; because required-field checks do not enforce schema types at runtime, intent: null can arrive and crash at .length, producing a generic internal failure instead of a controlled INVALID_INTENT error. This is user-reachable via malformed MCP payloads to file_upload.
Useful? React with 👍 / 👎.
GitHub did not schedule a pull_request CI run for the previous pushed fix commit, so this empty commit refreshes the PR head without changing behavior. Constraint: No code changes; refresh checks only. Confidence: high Scope-risk: narrow Tested: Previous commit carried targeted local test evidence. Not-tested: No additional local tests for empty commit. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Refresh this branch onto current develop while keeping the tab mock and isolated-context expectation fixes required for the shared CI matrix. Constraint: Resolve only develop merge conflicts and preserve the PR-scoped implementation. Confidence: high Scope-risk: moderate Tested: merge conflict resolution only; prior targeted tests covered the edited mock expectations. Not-tested: Full local matrix after merge. Co-authored-by: OmX <omx@oh-my-codex.dev>
After merging the upstream isolated-context mock change, the local CI fix produced a duplicate getTargetContextName property. Keep the default context behavior in one mock field so TypeScript and tabs_context tests pass. Constraint: Test-only CI stabilization; no production behavior changes. Rejected: Returning undefined from the mock | tabs_context expects a default context for created mock tabs. Confidence: high Scope-risk: narrow Tested: npm test -- --runInBand tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts; npm run lint:tool-schemas (issue-888 worktree) Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
The intent label sentence pushed interact's merged description past the tool-description budget enforced in CI. Compress the wording while preserving the intent audit-log guidance and coordinate-mode routing hints. Constraint: Maintain #894 intent schema and tool guidance under the 400-character description limit. Rejected: Raising the description test limit | the budget is part of the token-control contract. Confidence: high Scope-risk: narrow Tested: npm test -- --runInBand tests/tool-descriptions.test.ts tests/tools/intent.test.ts Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8b81d67a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; |
There was a problem hiding this comment.
Throw on invalid intent instead of returning isError
For intent: "" or intent.length > 120, this handler now returns a normal MCPResult with isError: true instead of throwing. In src/mcp-server.ts the post-handler success path still runs for non-thrown results, so these validation failures are logged as successful invocations and journaled with ok=true (including audit/metrics side effects). This breaks the stated “no side-effects on validation failure” contract and can corrupt operational telemetry for malformed interaction-tool requests.
Useful? React with 👍 / 👎.
Windows checks compare the JSON fixture after checkout with CRLF line endings, while JSON.stringify produces LF. Normalize fixture newlines at read time so the regression remains byte-stable in content without becoming platform-specific. Constraint: Test-only fix for cross-platform CI. Rejected: Regenerating the fixture with platform-specific endings | it would only move the failure between OSes. Confidence: high Scope-risk: narrow Tested: npm test -- --runInBand tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts; npm run lint:tool-schemas (issue-888 worktree) Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
The in-process CLI harness shares stdout hooks with Jest output, so unrelated async console rendering can prepend noise on Windows workers. Extract the single admin token and JSON array payloads from captured stdout instead of assuming the whole stream is clean. Constraint: Test-only stabilization for CI; CLI output behavior is unchanged. Rejected: Silencing unrelated orchestration logs globally | that would hide useful diagnostics outside this harness. Confidence: high Scope-risk: narrow Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts tests/tools/console-capture-regression.test.ts (issue-888 worktree) Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fab715efc8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (intent === '') { | ||
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; |
There was a problem hiding this comment.
Throw invalid intent errors instead of returning isError
The new INVALID_INTENT branch returns a normal MCPResult with isError: true, but handleToolCall treats all non-thrown handler returns as success and still writes success audit/metrics/journal entries (src/mcp-server.ts success path around logAuditEntry(... status: 'success') and journal.createEntry(..., true)). As a result, malformed intent input (empty or >120 chars) is recorded as a successful drag_drop action, which pollutes operational telemetry and journal state instead of behaving like a rejected request.
Useful? React with 👍 / 👎.
CI exposed timing-sensitive integration tests: stdio shutdown can exceed ten seconds under Node 18 load, and Windows can briefly hold a temp directory after the guardian child exits. Increase the shutdown wait and retry temp cleanup without changing production code. Constraint: Test-only stabilization for platform variance. Rejected: Treating these as product failures | the assertions passed functionally and failed on cleanup/timing boundaries. Confidence: high Scope-risk: narrow Tested: npm test -- --runInBand tests/integration/health-endpoint-gating.test.ts tests/utils/process-guardian.test.ts; npm test -- --runInBand tests/utils/process-guardian.test.ts Not-tested: Full GitHub matrix after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5b1e7227f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; |
There was a problem hiding this comment.
Throw INVALID_INTENT instead of returning isError in form_input
Returning isError: true here still goes through the MCP server success path for non-thrown handler results, so malformed intent values are audited/journaled as successful tool calls and counted in success metrics. That breaks the intended "validation failure with no side effects" behavior for form_input; this branch should throw (or otherwise enter the server error path) rather than return a normal result object.
Useful? React with 👍 / 👎.
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; |
There was a problem hiding this comment.
Throw INVALID_INTENT instead of returning isError in fill_form
This validation branch returns a normal MCP result with isError: true, which the server treats as a successful handler completion (including success audit/journal/metrics side effects). For invalid intent input, fill_form should fail through the error path instead of being recorded as a successful action.
Useful? React with 👍 / 👎.
| return { | ||
| content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }], | ||
| isError: true, | ||
| }; |
There was a problem hiding this comment.
Throw INVALID_INTENT instead of returning isError in file_upload
Returning isError: true here does not stop the MCP server from treating the call as a success, so invalid intent payloads are logged and journaled as successful file_upload invocations. This should use the thrown-error path for validation failures to avoid polluting operational telemetry and journal state.
Useful? React with 👍 / 👎.
Merge latest develop into PR #915 and keep the shared cursor/console fixture expectations aligned after the server-to-client primitive landed. Constraint: PR #962 updated the same cross-environment and console fixture tests, leaving this PR blocked by fixture-only conflicts.\nRejected: Rewriting feature code during conflict refresh | the merge conflict was limited to shared CI fixtures.\nConfidence: high\nScope-risk: narrow\nDirective: Keep conflict refreshes limited to shared fixture expectations unless feature code conflicts require design review.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00646c1c22
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (intent.length > 120) { | ||
| return { | ||
| content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }], |
There was a problem hiding this comment.
Use Unicode code-point length for intent limit
The new intent validator uses JavaScript string.length, which counts UTF-16 code units, not user-visible characters. This means valid 120-character intents containing astral Unicode (for example emoji) can be rejected as INVALID_INTENT because they count as 2 each, creating behavior that conflicts with the schema’s maxLength: 120 contract and causing avoidable tool-call failures for non-ASCII inputs.
Useful? React with 👍 / 👎.
Constraint: Pull request 915 only conflicted in tests/cli/admin-keys.test.ts after develop absorbed progress notifications. Rejected: Resolve product-code conflicts | This refresh is limited to equivalent admin-key stdout assertion logic. Confidence: high Scope-risk: narrow Directive: Keep plaintext-token assertions tolerant of unrelated Jest worker stdout noise. Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts --forceExit Not-tested: Full CI matrix not run locally; GitHub Actions will rerun after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0275395282
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| default: base = `${s} ${tool}`; break; | ||
| } | ||
| const intent = typeof args.intent === 'string' && args.intent.length > 0 ? args.intent : undefined; | ||
| return intent ? `${base} [intent: "${intent}"]` : base; |
There was a problem hiding this comment.
Sanitize intent before embedding in journal summary
generateSummary appends args.intent verbatim, but the new intent validators only enforce empty/non-empty and length, so payloads containing newlines/control characters are accepted and become part of summary. Because journal recent renders each entry as a single line and then joins with \n (src/tools/journal.ts), an intent like "ok\nforged" will split one entry into multiple displayed lines, breaking the one-entry-per-line audit trail format and making logs ambiguous. Escaping or rejecting control characters before interpolation avoids this regression.
Useful? React with 👍 / 👎.
…intent guard
Two fixes:
1. src/index.ts: process.stdout.end() + 'finish' event before process.exit(0)
in --introspect-tools-list path. Bare write()+exit(0) races the OS pipe
buffer on Linux/macOS (truncates at ~64 KB). The 'finish' event fires only
after the last byte leaves the kernel pipe buffer, matching Windows behavior.
2. scripts/lint-tool-schemas.mjs: replace readFileSync(fd=0) with async
for-await stdin accumulation. readFileSync on fd 0 reads a single 64 KB
chunk on Linux/macOS; the async iterable correctly accumulates all chunks.
3. src/tools/{interact,drag-drop,fill-form,form-input,file-upload}.ts:
change `intent !== undefined` guards to `typeof intent === 'string'` to
prevent a TypeError crash when a caller sends intent: null.
4. src/journal/task-journal.ts: add args.query to the interact summary
fallback chain (interact uses query, not description/selector).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep develop's chunked-write stdout drain and readStdin one-liner. Preserve both the new ref schema property from develop and the intent property from this branch in interact.ts inputSchema. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two readStdin definitions existed after the develop merge — the second shadowed the first but was never reached. Keep the encoding-explicit variant used at the JSON.parse call site.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa0851fccb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case 'interact': return `${s} Click "${args.description || args.selector || ''}"`; | ||
| case 'navigate': base = `${s} → ${args.url || 'unknown'}`; break; | ||
| case 'read_page': base = `${s} Read page`; break; | ||
| case 'interact': base = `${s} Click "${args.query || args.description || args.selector || ''}"`; break; |
There was a problem hiding this comment.
Keep interact summary unchanged when intent is omitted
TaskJournal.generateSummary now formats interact entries with args.query first, but pre-change behavior only used description/selector. For normal interact calls that provide query and omit intent, this changes the summary text instead of keeping it byte-identical to v1.11.0, which breaks the compatibility contract described in this PR and can affect consumers/tests that rely on stable journal summaries.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fb76fa698
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const { result: cssDomResult, verify: cssVerifyReport } = await runVerify( | ||
| page, | ||
| verifyMode, | ||
| async () => | ||
| withDomDelta( |
There was a problem hiding this comment.
Rewrap interact clicks with irreversible-action guard
In environments where contract runtime safety is enabled, this branch now executes the click path directly via runVerify/withDomDelta and skips guardIrreversibleBrowserAction, so policy hooks that block risky actions (e.g., payment/order/destructive labels) are never invoked. That means high-impact interact clicks can proceed without the previous irreversible-action gate and blocked-result path.
Useful? React with 👍 / 👎.
| const result = { | ||
| content: responseContent, | ||
| ...(extraTopLevel || {}), | ||
| } as MCPResult, verifyReport); | ||
| if (sessionId && tabId) { | ||
| await appendReturnAfterState(result, page, sessionId, tabId, returnAfterState, context); | ||
| } | ||
| return result; | ||
| } as MCPResult; | ||
| return attachVerifyReport(result, verifyReport); |
There was a problem hiding this comment.
Preserve returnAfterState snapshot chaining in interact
This helper now returns immediately without calling appendReturnAfterState, so callers that pass returnAfterState: "ax" | "dom" no longer receive the post-action state snapshot that the input-tool chaining contract previously provided. That is a behavior regression for existing automation flows and also misaligns with hint rules that treat args.returnAfterState as evidence that a fresh snapshot was already inlined.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fba09a6cdc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ok, | ||
| summary: this.generateSummary(tool, args, ok), | ||
| milestone: MILESTONE_TOOLS.has(tool) || undefined, | ||
| resultSummary: sanitizedResult, | ||
| }; |
There was a problem hiding this comment.
Restore failure classification in task journal entries
TaskJournal.createEntry no longer records resultSummary or derives failureClass/errorFingerprint, so downstream recovery diagnostics silently stop working even though mcp-server still passes result/error summaries into this API. In practice, oc_journal summary can no longer surface failure classes, repeated error fingerprints, or candidate recovery hints, which is a regression in incident triage and loop detection rather than an intent-label change.
Useful? React with 👍 / 👎.
| if (response.success) { | ||
| const successResult: MCPResult = { | ||
| return { | ||
| content: [{ type: 'text', text: (response.message || 'Value set successfully') + delta }], | ||
| }; |
There was a problem hiding this comment.
Reattach returnAfterState snapshot in form_input success path
This success branch now returns immediately without appending the requested post-action snapshot, so callers that previously relied on returnAfterState: "ax"|"dom" no longer receive chained state after form_input. That breaks existing token-saving workflows which depend on getting fresh state from the same call instead of issuing an additional read_page.
Useful? React with 👍 / 👎.
| return { | ||
| content: [ | ||
| { | ||
| type: 'text', |
There was a problem hiding this comment.
Reattach returnAfterState snapshot in fill_form success path
fill_form now returns its result directly and never appends a post-action state snapshot, so flows that pass returnAfterState lose the chaining contract and must make extra observe calls to recover state. This is a behavior regression for existing automation that depended on fill_form returning immediate post-submit/post-fill state when the call succeeds.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75fa90735f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const finalResult = redactSecrets(result); | ||
| this.recordToolOutputObservability(toolName, finalResult); | ||
|
|
||
| // Record to task journal after response redaction so arbitrary literal | ||
| // secret values cannot be persisted in journal result summaries. | ||
| try { | ||
| const journal = getTaskJournal(); | ||
| const entry = journal.createEntry( | ||
| toolName, | ||
| sessionId, | ||
| toolArgs, | ||
| Date.now() - toolStartTime, | ||
| !(finalResult as MCPResult).isError, | ||
| summarizeMcpResultForJournal(finalResult as MCPResult), | ||
| ); | ||
| journal.record(entry); | ||
| } catch { | ||
| // Best-effort journal recording | ||
| } | ||
|
|
||
| return finalResult; |
There was a problem hiding this comment.
Preserve isError-aware journal write before returning result
Returning immediately here removes the only journal path that derived ok from finalResult.isError; the remaining success-path write still calls journal.createEntry(..., true), so any non-thrown tool failure (MCPResult.isError === true) is now journaled as a success. This affects more than the new intent checks (for example swallowed/soft errors from other tools) and will underreport failures in oc_journal/summary telemetry.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e406e0669
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return text ? text.slice(0, 500) : undefined; | ||
| } | ||
|
|
||
| export class MCPServer { |
There was a problem hiding this comment.
Restore summarizeMcpResultForJournal export
This commit removes summarizeMcpResultForJournal from src/mcp-server.ts, but tests/tools/journal.test.ts still imports and executes it (import { MCPServer, summarizeMcpResultForJournal } ... at line 42 and usage at line 68). That leaves the tree in a broken state where type-check/test compilation fails with a missing export, so either the helper must remain exported here or the dependent test/callers must be updated in the same change.
Useful? React with 👍 / 👎.
Codex P1/P2: the capability-map generator PR (#826) is supposed to be additive — it adds a 'category' field to MCPToolDefinition for the auto-generated docs/agent/capability-map.md preamble. It must not silently drop other user-visible contract fields. This commit restores the dropped surface on 9 tool files: intent (#915, ≤120 char audit label): drag_drop, file_upload, fill_form, form_input - input schema entry, description note, and the handler-side INVALID_INTENT validation block annotations (#867, MCP readOnlyHint/destructiveHint metadata): drag_drop, file_upload, fill_form, form_input, interact, request_intercept, tabs_close, oc_profile_status, oc_recording_start, oc_recording_stop, oc_recording_status, oc_recording_list, oc_recording_export Method: rebased each affected file to develop's known-good state, then re-applied PR #927's only PR-specific addition (category: '<group>' on the definition). The capability-map generator script, docs, and CI workflow are untouched. oc_recording_status also gains the category: 'recording' tag that the PR forgot to add when it removed the previous annotations field, so the lint:tool-categories sweep stays consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
feat/894-intent-label→developfa0851f— chore(scripts): drop duplicate readStdin helper in lint-tool-schemasOwner comment cleanup: 1 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Closes #894.
Summary
Adds an optional `intent?: string` (max 120 chars) to the five interaction tools — `interact`, `form_input`, `fill_form`, `drag_drop`, `file_upload`. When provided, the label flows into the audit trail; when omitted, every downstream record is byte-identical to v1.11.0.
Wiring
Validation (no DOM side-effects on failure)
Absent-vs-empty contract
When `intent` is omitted by the caller, no `intent` key appears in journal entries; the summary string is byte-identical to v1.11.0. This is asserted in test `omitted intent produces v1.11.0-identical summary`.
Deferred (Out of Scope, per the issue)
Tests (11 new, all passing)
Full `npm test`: `4425 passed` — the 11 new intent tests pass; the 29 failures are pre-existing develop CI red baseline (screenshot-class timing tests, identical failures show up on the v1.11.0 baseline).
Test plan