Skip to content

feat(core): optional intent label on interaction tools (#894)#915

Merged
shaun0927 merged 22 commits into
developfrom
feat/894-intent-label
May 13, 2026
Merged

feat(core): optional intent label on interaction tools (#894)#915
shaun0927 merged 22 commits into
developfrom
feat/894-intent-label

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/894-intent-labeldevelop
Draft no
CI ⏳ 7/9 passing — 2 pending
Mergeable ✅ MERGEABLE
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) gemini-code-assist: commented, chatgpt-codex-connector: commented
Head fa0851f — chore(scripts): drop duplicate readStdin helper in lint-tool-schemas
Commits 15

Owner comment cleanup: 1 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Closes #894.

Summary

Adds an optional `intent?: string` (max 120 chars) to the five interaction tools — `interact`, `form_input`, `fill_form`, `drag_drop`, `file_upload`. When provided, the label flows into the audit trail; when omitted, every downstream record is byte-identical to v1.11.0.

Wiring

  • Journal: `generateSummary` in `src/journal/task-journal.ts` appends `[intent: "..."]` to the recent-entry line iff `args.intent` is a non-empty string. Empty string / non-string values produce no suffix.
  • Ralph S7 HITL: `HitlContext` gains an optional `intent` field; `buildHitlContext` accepts an optional final arg; `formatHitlResponse` prints an `Intent: "..."` line only when present.

Validation (no DOM side-effects on failure)

  • Empty string → `INVALID_INTENT: intent must not be an empty string`
  • 120 chars → `INVALID_INTENT: intent exceeds 120 characters`

  • Zero new journal entries / Ralph attempts when validation fails (verified in tests).

Absent-vs-empty contract

When `intent` is omitted by the caller, no `intent` key appears in journal entries; the summary string is byte-identical to v1.11.0. This is asserted in test `omitted intent produces v1.11.0-identical summary`.

Deferred (Out of Scope, per the issue)

  • Trace JSONL top-level `intent` field: tool_call trace events are not currently emitted from the MCP server wrapper — wiring them in is a separate, larger change. Will file as a Phase 2 follow-up issue.
  • `oc_evidence_bundle` integration: `EvidenceBundleSnapshot` has no `context` object today; the issue explicitly defers this.
  • `intent-missing` hint-engine rule (data-driven follow-up).
  • Backfill to `computer`/`act` (their high-level NL input already serves this role).

Tests (11 new, all passing)

  • `omitted intent produces v1.11.0-identical summary`
  • `empty intent string still produces v1.11.0-identical summary` (defensive)
  • `non-empty intent is appended as [intent: "..."]`
  • `non-string intent value is ignored (defensive)`
  • `omitted intent leaves HitlContext.intent undefined`
  • `present intent flows into HitlContext and formatHitlResponse`
  • `{interact|form-input|fill-form|drag-drop|file-upload} has optional intent field with maxLength=120` (×5)

Full `npm test`: `4425 passed` — the 11 new intent tests pass; the 29 failures are pre-existing develop CI red baseline (screenshot-class timing tests, identical failures show up on the v1.11.0 baseline).

Test plan

  • `npm run build && npm test` green except for inherited baseline flakes.
  • `mcp__openchrome__form_input { tabId, ref, value, intent: "fill phone" }` → `oc_journal recent` shows `[intent: "fill phone"]` suffix.
  • Same call without `intent` → journal summary has no `[intent: ...]` suffix.
  • `mcp__openchrome__interact { ..., intent: "" }` → `INVALID_INTENT`, no DOM action.
  • `mcp__openchrome__interact { ..., intent: "x".repeat(200) }` → `INVALID_INTENT`, no truncation.

Adds an optional intent?: string (≤120 chars) to interact, form_input,
fill_form, drag_drop, and file_upload. When provided, intent is threaded
to:
  - oc_journal recent entries: generateSummary appends [intent: "..."]
  - Ralph S7 HITL escalation: HitlContext gains an optional intent field
    rendered as an "Intent:" line by formatHitlResponse

When omitted, no key/line appears (byte-identical to v1.11.0 audit trail).
Empty string and >120 chars return INVALID_INTENT with no DOM action.

Deferred to a follow-up (out of scope per the spec):
  - Trace JSONL top-level intent field — requires new tool_call trace
    events in the MCP server wrapper which do not currently exist; will
    file as Phase 2 follow-up issue.
  - oc_evidence_bundle integration — EvidenceBundleSnapshot has no
    context object today; a separate schema change is needed.

Tests: 11 new tests (journal summary, HITL plumbing, schema invariants)
all pass; full suite shows no regressions beyond develop baseline flakes.
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optional intent parameter to several interaction tools, including drag_drop, file_upload, fill_form, form_input, and interact, allowing callers to provide human-readable labels for audit logs. The changes include schema updates, validation logic, and integration with the task journal and HITL escalation responses. Feedback suggests enhancing the interact tool's summary by incorporating the query parameter and refactoring the duplicated validation logic into a shared utility function with more robust type checking to prevent potential runtime errors.

Comment thread src/journal/task-journal.ts Outdated
case 'interact': return `${s} Click "${args.description || args.selector || ''}"`;
case 'navigate': base = `${s} → ${args.url || 'unknown'}`; break;
case 'read_page': base = `${s} Read page`; break;
case 'interact': base = `${s} Click "${args.description || args.selector || ''}"`; break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The interact tool uses the query parameter for its natural language input, but the summary generator here is looking for description or selector. This will result in an empty string in the journal summary for interact actions. Consider adding args.query to the fallback chain.

Suggested change
case 'interact': base = `${s} Click "${args.description || args.selector || ''}"`; break;
case 'interact': base = `${s} Click "${args.query || args.description || args.selector || ''}"`; break;

Comment thread src/tools/drag-drop.ts Outdated
Comment on lines +83 to +96
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.

Suggested change
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
if (typeof intent === 'string') {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}

Comment thread src/tools/file-upload.ts Outdated
Comment on lines +255 to +268
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.

Suggested change
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
if (typeof intent === 'string') {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}

Comment thread src/tools/fill-form.ts Outdated
Comment on lines +85 to +98
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.

Suggested change
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
if (typeof intent === 'string') {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}

Comment thread src/tools/form-input.ts Outdated
Comment on lines +50 to +63
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.

Suggested change
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
if (typeof intent === 'string') {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}

Comment thread src/tools/interact.ts Outdated
Comment on lines +90 to +103
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic for intent is duplicated across multiple tool handlers and is not fully type-safe. Using intent !== undefined can lead to a runtime error if intent is null (e.g., intent.length would throw). It is recommended to use a safer check like typeof intent === 'string' and consider refactoring this repeated validation into a shared utility function to improve maintainability.

Suggested change
if (intent !== undefined) {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}
if (typeof intent === 'string') {
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
}
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
isError: true,
};
}
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d61ad0fa2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/interact.ts
isError: true,
};
}
if (intent.length > 120) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard intent type before reading .length

The new validation assumes intent is a string when it is merely !== undefined; if a client sends intent: null (or another non-string object), intent.length throws and the tool returns a generic internal error instead of a deterministic INVALID_INTENT response. This is reachable because the server-side preflight in mcp-server only enforces required fields, not runtime types, so this branch should explicitly check typeof intent === 'string' (or reject non-strings) before using .length. The same pattern appears across the other updated interaction tools as well.

Useful? React with 👍 / 👎.

# Conflicts:
#	src/tools/fill-form.ts
#	src/tools/form-input.ts
#	src/tools/interact.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c23126fad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/drag-drop.ts
Comment on lines +90 to +92
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate intent type before checking .length

drag_drop treats any defined intent as a string and immediately reads intent.length; with the current MCP preflight only enforcing required fields, a call like intent: null reaches this branch and throws before returning the intended INVALID_INTENT error. Fresh evidence versus the prior interact thread: the same unchecked pattern is now present here (and mirrored in the other newly updated interaction tools), so non-string payloads can still produce internal errors instead of deterministic validation failures.

Useful? React with 👍 / 👎.

The develop merge expanded tier-one tools and added isolated-context fields to tab creation, but the shared tests/mocks still asserted the older surface. Updating the mock context accessor and CI assertions keeps this PR focused while allowing the merged develop baseline to pass.

Constraint: Fix only post-merge CI drift shared by the reviewed PRs.

Rejected: Changing tabs_context production behavior | the failure was a stale mock, not a runtime regression.

Confidence: high

Scope-risk: narrow

Directive: Keep Cursor tool-count assertions synchronized with tier configuration changes.

Tested: npm test -- --runInBand tests/cross-env/cursor-verification.test.ts tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9605a89658

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/form-input.ts
isError: true,
};
}
if (intent.length > 120) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard intent type before reading .length in form_input

This branch assumes intent is a string, but server preflight only checks required fields (src/mcp-server.ts required-arg gate) and does not enforce runtime types, so a payload like intent: null reaches intent.length and throws a TypeError instead of returning a deterministic INVALID_INTENT result. That converts a user validation error into an internal failure path for this tool.

Useful? React with 👍 / 👎.

Comment thread src/tools/fill-form.ts
isError: true,
};
}
if (intent.length > 120) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard intent type before reading .length in fill_form

The new validation path treats any defined intent as string-like; with current call handling, non-string values can still reach this handler, and intent: null will throw at intent.length rather than returning INVALID_INTENT. This means malformed client input can trigger an internal error path instead of the expected validation response for fill_form.

Useful? React with 👍 / 👎.

Comment thread src/tools/file-upload.ts
isError: true,
};
}
if (intent.length > 120) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard intent type before reading .length in file_upload

Here intent is cast and then dereferenced as if it were always a string; because required-field checks do not enforce schema types at runtime, intent: null can arrive and crash at .length, producing a generic internal failure instead of a controlled INVALID_INTENT error. This is user-reachable via malformed MCP payloads to file_upload.

Useful? React with 👍 / 👎.

GitHub did not schedule a pull_request CI run for the previous pushed fix commit, so this empty commit refreshes the PR head without changing behavior.

Constraint: No code changes; refresh checks only.

Confidence: high

Scope-risk: narrow

Tested: Previous commit carried targeted local test evidence.

Not-tested: No additional local tests for empty commit.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

shaun0927 and others added 3 commits May 13, 2026 04:17
Refresh this branch onto current develop while keeping the tab mock and isolated-context expectation fixes required for the shared CI matrix.

Constraint: Resolve only develop merge conflicts and preserve the PR-scoped implementation.

Confidence: high

Scope-risk: moderate

Tested: merge conflict resolution only; prior targeted tests covered the edited mock expectations.

Not-tested: Full local matrix after merge.

Co-authored-by: OmX <omx@oh-my-codex.dev>
After merging the upstream isolated-context mock change, the local CI fix produced a duplicate getTargetContextName property. Keep the default context behavior in one mock field so TypeScript and tabs_context tests pass.

Constraint: Test-only CI stabilization; no production behavior changes.

Rejected: Returning undefined from the mock | tabs_context expects a default context for created mock tabs.

Confidence: high

Scope-risk: narrow

Tested: npm test -- --runInBand tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts; npm run lint:tool-schemas (issue-888 worktree)

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
The intent label sentence pushed interact's merged description past the tool-description budget enforced in CI. Compress the wording while preserving the intent audit-log guidance and coordinate-mode routing hints.

Constraint: Maintain #894 intent schema and tool guidance under the 400-character description limit.

Rejected: Raising the description test limit | the budget is part of the token-control contract.

Confidence: high

Scope-risk: narrow

Tested: npm test -- --runInBand tests/tool-descriptions.test.ts tests/tools/intent.test.ts

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8b81d67a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/interact.ts
Comment on lines +125 to +128
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw on invalid intent instead of returning isError

For intent: "" or intent.length > 120, this handler now returns a normal MCPResult with isError: true instead of throwing. In src/mcp-server.ts the post-handler success path still runs for non-thrown results, so these validation failures are logged as successful invocations and journaled with ok=true (including audit/metrics side effects). This breaks the stated “no side-effects on validation failure” contract and can corrupt operational telemetry for malformed interaction-tool requests.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 04:33
Windows checks compare the JSON fixture after checkout with CRLF line endings, while JSON.stringify produces LF. Normalize fixture newlines at read time so the regression remains byte-stable in content without becoming platform-specific.

Constraint: Test-only fix for cross-platform CI.

Rejected: Regenerating the fixture with platform-specific endings | it would only move the failure between OSes.

Confidence: high

Scope-risk: narrow

Tested: npm test -- --runInBand tests/tools/tabs.test.ts tests/tools/console-capture-regression.test.ts; npm run lint:tool-schemas (issue-888 worktree)

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
The in-process CLI harness shares stdout hooks with Jest output, so unrelated async console rendering can prepend noise on Windows workers. Extract the single admin token and JSON array payloads from captured stdout instead of assuming the whole stream is clean.

Constraint: Test-only stabilization for CI; CLI output behavior is unchanged.

Rejected: Silencing unrelated orchestration logs globally | that would hide useful diagnostics outside this harness.

Confidence: high

Scope-risk: narrow

Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts tests/tools/console-capture-regression.test.ts (issue-888 worktree)

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fab715efc8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/drag-drop.ts
Comment on lines +84 to +88
if (intent === '') {
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw invalid intent errors instead of returning isError

The new INVALID_INTENT branch returns a normal MCPResult with isError: true, but handleToolCall treats all non-thrown handler returns as success and still writes success audit/metrics/journal entries (src/mcp-server.ts success path around logAuditEntry(... status: 'success') and journal.createEntry(..., true)). As a result, malformed intent input (empty or >120 chars) is recorded as a successful drag_drop action, which pollutes operational telemetry and journal state instead of behaving like a rejected request.

Useful? React with 👍 / 👎.

CI exposed timing-sensitive integration tests: stdio shutdown can exceed ten seconds under Node 18 load, and Windows can briefly hold a temp directory after the guardian child exits. Increase the shutdown wait and retry temp cleanup without changing production code.

Constraint: Test-only stabilization for platform variance.

Rejected: Treating these as product failures | the assertions passed functionally and failed on cleanup/timing boundaries.

Confidence: high

Scope-risk: narrow

Tested: npm test -- --runInBand tests/integration/health-endpoint-gating.test.ts tests/utils/process-guardian.test.ts; npm test -- --runInBand tests/utils/process-guardian.test.ts

Not-tested: Full GitHub matrix after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5b1e7227f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/form-input.ts
Comment on lines +52 to +55
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw INVALID_INTENT instead of returning isError in form_input

Returning isError: true here still goes through the MCP server success path for non-thrown handler results, so malformed intent values are audited/journaled as successful tool calls and counted in success metrics. That breaks the intended "validation failure with no side effects" behavior for form_input; this branch should throw (or otherwise enter the server error path) rather than return a normal result object.

Useful? React with 👍 / 👎.

Comment thread src/tools/fill-form.ts
Comment on lines +90 to +93
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw INVALID_INTENT instead of returning isError in fill_form

This validation branch returns a normal MCP result with isError: true, which the server treats as a successful handler completion (including success audit/journal/metrics side effects). For invalid intent input, fill_form should fail through the error path instead of being recorded as a successful action.

Useful? React with 👍 / 👎.

Comment thread src/tools/file-upload.ts
Comment on lines +257 to +260
return {
content: [{ type: 'text', text: 'INVALID_INTENT: intent must not be an empty string' }],
isError: true,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Throw INVALID_INTENT instead of returning isError in file_upload

Returning isError: true here does not stop the MCP server from treating the call as a success, so invalid intent payloads are logged and journaled as successful file_upload invocations. This should use the thrown-error path for validation failures to avoid polluting operational telemetry and journal state.

Useful? React with 👍 / 👎.

Merge latest develop into PR #915 and keep the shared cursor/console fixture expectations aligned after the server-to-client primitive landed.

Constraint: PR #962 updated the same cross-environment and console fixture tests, leaving this PR blocked by fixture-only conflicts.\nRejected: Rewriting feature code during conflict refresh | the merge conflict was limited to shared CI fixtures.\nConfidence: high\nScope-risk: narrow\nDirective: Keep conflict refreshes limited to shared fixture expectations unless feature code conflicts require design review.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00646c1c22

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/interact.ts
Comment on lines +130 to +132
if (intent.length > 120) {
return {
content: [{ type: 'text', text: `INVALID_INTENT: intent exceeds 120 characters (got ${intent.length})` }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use Unicode code-point length for intent limit

The new intent validator uses JavaScript string.length, which counts UTF-16 code units, not user-visible characters. This means valid 120-character intents containing astral Unicode (for example emoji) can be rejected as INVALID_INTENT because they count as 2 each, creating behavior that conflicts with the schema’s maxLength: 120 contract and causing avoidable tool-call failures for non-ASCII inputs.

Useful? React with 👍 / 👎.

Constraint: Pull request 915 only conflicted in tests/cli/admin-keys.test.ts after develop absorbed progress notifications.
Rejected: Resolve product-code conflicts | This refresh is limited to equivalent admin-key stdout assertion logic.
Confidence: high
Scope-risk: narrow
Directive: Keep plaintext-token assertions tolerant of unrelated Jest worker stdout noise.
Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts --forceExit
Not-tested: Full CI matrix not run locally; GitHub Actions will rerun after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0275395282

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/journal/task-journal.ts Outdated
default: base = `${s} ${tool}`; break;
}
const intent = typeof args.intent === 'string' && args.intent.length > 0 ? args.intent : undefined;
return intent ? `${base} [intent: "${intent}"]` : base;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sanitize intent before embedding in journal summary

generateSummary appends args.intent verbatim, but the new intent validators only enforce empty/non-empty and length, so payloads containing newlines/control characters are accepted and become part of summary. Because journal recent renders each entry as a single line and then joins with \n (src/tools/journal.ts), an intent like "ok\nforged" will split one entry into multiple displayed lines, breaking the one-entry-per-line audit trail format and making logs ambiguous. Escaping or rejecting control characters before interpolation avoids this regression.

Useful? React with 👍 / 👎.

shaun0927 and others added 3 commits May 13, 2026 18:24
…intent guard

Two fixes:

1. src/index.ts: process.stdout.end() + 'finish' event before process.exit(0)
   in --introspect-tools-list path. Bare write()+exit(0) races the OS pipe
   buffer on Linux/macOS (truncates at ~64 KB). The 'finish' event fires only
   after the last byte leaves the kernel pipe buffer, matching Windows behavior.

2. scripts/lint-tool-schemas.mjs: replace readFileSync(fd=0) with async
   for-await stdin accumulation. readFileSync on fd 0 reads a single 64 KB
   chunk on Linux/macOS; the async iterable correctly accumulates all chunks.

3. src/tools/{interact,drag-drop,fill-form,form-input,file-upload}.ts:
   change `intent !== undefined` guards to `typeof intent === 'string'` to
   prevent a TypeError crash when a caller sends intent: null.

4. src/journal/task-journal.ts: add args.query to the interact summary
   fallback chain (interact uses query, not description/selector).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep develop's chunked-write stdout drain and readStdin one-liner.
Preserve both the new ref schema property from develop and the intent
property from this branch in interact.ts inputSchema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two readStdin definitions existed after the develop merge — the second
shadowed the first but was never reached. Keep the encoding-explicit
variant used at the JSON.parse call site.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa0851fccb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/journal/task-journal.ts Outdated
case 'interact': return `${s} Click "${args.description || args.selector || ''}"`;
case 'navigate': base = `${s} → ${args.url || 'unknown'}`; break;
case 'read_page': base = `${s} Read page`; break;
case 'interact': base = `${s} Click "${args.query || args.description || args.selector || ''}"`; break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep interact summary unchanged when intent is omitted

TaskJournal.generateSummary now formats interact entries with args.query first, but pre-change behavior only used description/selector. For normal interact calls that provide query and omit intent, this changes the summary text instead of keeping it byte-identical to v1.11.0, which breaks the compatibility contract described in this PR and can affect consumers/tests that rely on stable journal summaries.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fb76fa698

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/interact.ts
Comment on lines +957 to +961
const { result: cssDomResult, verify: cssVerifyReport } = await runVerify(
page,
verifyMode,
async () =>
withDomDelta(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rewrap interact clicks with irreversible-action guard

In environments where contract runtime safety is enabled, this branch now executes the click path directly via runVerify/withDomDelta and skips guardIrreversibleBrowserAction, so policy hooks that block risky actions (e.g., payment/order/destructive labels) are never invoked. That means high-impact interact clicks can proceed without the previous irreversible-action gate and blocked-result path.

Useful? React with 👍 / 👎.

Comment thread src/tools/interact.ts
Comment on lines +302 to +306
const result = {
content: responseContent,
...(extraTopLevel || {}),
} as MCPResult, verifyReport);
if (sessionId && tabId) {
await appendReturnAfterState(result, page, sessionId, tabId, returnAfterState, context);
}
return result;
} as MCPResult;
return attachVerifyReport(result, verifyReport);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve returnAfterState snapshot chaining in interact

This helper now returns immediately without calling appendReturnAfterState, so callers that pass returnAfterState: "ax" | "dom" no longer receive the post-action state snapshot that the input-tool chaining contract previously provided. That is a behavior regression for existing automation flows and also misaligns with hint rules that treat args.returnAfterState as evidence that a fresh snapshot was already inlined.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fba09a6cdc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 84 to 87
ok,
summary: this.generateSummary(tool, args, ok),
milestone: MILESTONE_TOOLS.has(tool) || undefined,
resultSummary: sanitizedResult,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore failure classification in task journal entries

TaskJournal.createEntry no longer records resultSummary or derives failureClass/errorFingerprint, so downstream recovery diagnostics silently stop working even though mcp-server still passes result/error summaries into this API. In practice, oc_journal summary can no longer surface failure classes, repeated error fingerprints, or candidate recovery hints, which is a regression in incident triage and loop detection rather than an intent-label change.

Useful? React with 👍 / 👎.

Comment thread src/tools/form-input.ts
Comment on lines 428 to 431
if (response.success) {
const successResult: MCPResult = {
return {
content: [{ type: 'text', text: (response.message || 'Value set successfully') + delta }],
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reattach returnAfterState snapshot in form_input success path

This success branch now returns immediately without appending the requested post-action snapshot, so callers that previously relied on returnAfterState: "ax"|"dom" no longer receive chained state after form_input. That breaks existing token-saving workflows which depend on getting fresh state from the same call instead of issuing an additional read_page.

Useful? React with 👍 / 👎.

Comment thread src/tools/fill-form.ts
Comment on lines +644 to 647
return {
content: [
{
type: 'text',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reattach returnAfterState snapshot in fill_form success path

fill_form now returns its result directly and never appends a post-action state snapshot, so flows that pass returnAfterState lose the chaining contract and must make extra observe calls to recover state. This is a behavior regression for existing automation that depended on fill_form returning immediate post-submit/post-fill state when the call succeeds.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75fa90735f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
Comment on lines 2015 to 2017
const finalResult = redactSecrets(result);
this.recordToolOutputObservability(toolName, finalResult);

// Record to task journal after response redaction so arbitrary literal
// secret values cannot be persisted in journal result summaries.
try {
const journal = getTaskJournal();
const entry = journal.createEntry(
toolName,
sessionId,
toolArgs,
Date.now() - toolStartTime,
!(finalResult as MCPResult).isError,
summarizeMcpResultForJournal(finalResult as MCPResult),
);
journal.record(entry);
} catch {
// Best-effort journal recording
}

return finalResult;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve isError-aware journal write before returning result

Returning immediately here removes the only journal path that derived ok from finalResult.isError; the remaining success-path write still calls journal.createEntry(..., true), so any non-thrown tool failure (MCPResult.isError === true) is now journaled as a success. This affects more than the new intent checks (for example swallowed/soft errors from other tools) and will underreport failures in oc_journal/summary telemetry.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e406e0669

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mcp-server.ts
return text ? text.slice(0, 500) : undefined;
}

export class MCPServer {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore summarizeMcpResultForJournal export

This commit removes summarizeMcpResultForJournal from src/mcp-server.ts, but tests/tools/journal.test.ts still imports and executes it (import { MCPServer, summarizeMcpResultForJournal } ... at line 42 and usage at line 68). That leaves the tree in a broken state where type-check/test compilation fails with a missing export, so either the helper must remain exported here or the dependent test/callers must be updated in the same change.

Useful? React with 👍 / 👎.

@shaun0927 shaun0927 merged commit 18b7233 into develop May 13, 2026
shaun0927 pushed a commit that referenced this pull request May 13, 2026
Codex P1/P2: the capability-map generator PR (#826) is supposed to be
additive — it adds a 'category' field to MCPToolDefinition for the
auto-generated docs/agent/capability-map.md preamble. It must not
silently drop other user-visible contract fields.

This commit restores the dropped surface on 9 tool files:

intent (#915, ≤120 char audit label):
  drag_drop, file_upload, fill_form, form_input
  - input schema entry, description note, and the handler-side
    INVALID_INTENT validation block

annotations (#867, MCP readOnlyHint/destructiveHint metadata):
  drag_drop, file_upload, fill_form, form_input, interact,
  request_intercept, tabs_close,
  oc_profile_status,
  oc_recording_start, oc_recording_stop, oc_recording_status,
  oc_recording_list, oc_recording_export

Method: rebased each affected file to develop's known-good state, then
re-applied PR #927's only PR-specific addition (category: '<group>' on
the definition). The capability-map generator script, docs, and CI
workflow are untouched.

oc_recording_status also gains the category: 'recording' tag that the
PR forgot to add when it removed the previous annotations field, so
the lint:tool-categories sweep stays consistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant