feat(harness): task envelope budgets for browser work (#1034) by shaun0927 · Pull Request #1082 · shaun0927/openchrome

shaun0927 · 2026-05-12T15:14:32Z

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field	Value
Branch	`feat/1034-task-envelope` → `feat/855-task-ledger`
Draft	no
CI	—
Mergeable	❌ CONFLICTING
Review decision	—
Codex (latest)	💡 suggestions posted
Other reviewers (latest)	chatgpt-codex-connector: commented
Head	`8a7f392` — fix(harness): skip session init for task ledger tools (#1034)
Commits	10

_{Owner comment cleanup: 11 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.}

Summary

Implements the first, bounded slice of #1034 on top of #855:

adds host-driven browser_task envelopes to oc_task_start when kind is omitted
records OpenChrome tool calls against a task when callers pass taskId / task_id
tracks deterministic wandering budgets: total calls, same-tool repetition, observation streak, failure streak, same-URL navigations, wall time
exposes current counters, budget status, exceeded keys, recommendations, and recent events through oc_task_get
adds oc_task_update and oc_task_finish for host-controlled phase/terminal lifecycle
keeps direct tool behavior unchanged when no task id is supplied

Direction / duplication review before implementation

Base branch is feat/855-task-ledger because feat(harness): task envelope budgets to bound browser-agent wandering #1034 explicitly depends on the task ledger from PR feat(core): oc_task_ledger — persistent async task table with cancel & wait (#855) #911.
Open PR overlap checked: feat(core): oc_task_ledger — persistent async task table with cancel & wait (#855) #911 owns persistent async task storage; this PR composes with it instead of adding another ledger. feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts (#854) #933/feat(core): 2-stage fetch with output handles for large-output tools (closes #887) #938/feat(core): ToolAnnotations on every tool (#867) #945/feat(core): notifications/progress for long-running tools (#869) #954/feat(tools): dryRun mode for cookies and storage (#878 partial) #957 are adjacent but not duplicated.
The implementation stays within OpenChrome's direction: deterministic browser-harness facts only, no server-side LLM calls, no autonomous task planning, no extra Chrome process policy.
oc_task_update / oc_task_finish are restricted to browser_task envelopes so they cannot race or override background jobs launched by feat(core): oc_task_ledger — persistent async task table with cancel & wait (mcp-browser-use adoption A) #855.

Validation

npm test -- --runTestsByPath tests/core/task-ledger/store.test.ts tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts
npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts
npm run build
npm run lint:tier
npm run lint -- --quiet

Live verification status

Attempted a real OpenChrome script against a local fixture (navigate + repeated read_page with taskId). The local runtime's attached Chrome on port 9222 disconnected during navigate (Not connected to Chrome. Call connect() first.), so full browser live verification is not claimed in this PR. Unit/build/lint evidence is fresh; live transcript gap is documented rather than hidden.

Notes for reviewers

This PR intentionally does not implement automatic checkpoints (#1035), evidence digest (#1036), browser lanes (#1037), or irreversible policy (#1038). Those should stack on this task envelope after #855 lands.

Closes #1034 when merged after #855.

Add an opt-in task envelope on top of the existing task ledger: host-declared objective/phase/policy, deterministic budget counters, recent task events, and finish/update verbs. Normal tool calls are unchanged unless a valid taskId is supplied. Constraint: OpenChrome remains an MCP browser harness; no server-side LLM planning or autonomous execution was added. Rejected: Full Goose-style agent loop inside openchrome | would violate the facts-vs-decisions boundary and duplicate host-agent responsibilities. Confidence: medium Scope-risk: moderate Directive: Keep future task-harness work deterministic and compose with the #855 task ledger instead of adding parallel storage. Tested: npm test -- --runTestsByPath tests/core/task-ledger/store.test.ts tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts; npm run build; npm run lint:tier; npm run lint -- --quiet Not-tested: Full browser live verification failed in this local runtime because the attached Chrome on port 9222 disconnected during navigate; PR includes unit/build evidence and documents the live-verification gap.

gemini-code-assist · 2026-05-12T15:14:57Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

qodo-code-review · 2026-05-12T15:14:58Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cff77493b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T15:20:14Z

+        description: 'Reserved for #1036; currently returns the same task envelope summary fields.',
+      },
    },
    required: ['task_id'],


Accept taskId in oc_task_get schema requirements

The new taskId alias is unreachable because the tool schema still requires task_id, and MCPServer rejects missing required fields before the handler can apply params.task_id ?? params.taskId. In practice, calls that send only taskId now fail at argument validation with Missing required argument(s): task_id, so the alias added in this change does not work for clients and breaks compatibility with camelCase callers.

Useful? React with 👍 / 👎.

Remove the schema-only task_id requirement so oc_task_get can honor its documented taskId alias, then cover the alias at both schema and handler level. Constraint: PR #1082 Codex review found JSON-schema validation rejected taskId before the handler alias path could run. Rejected: Dropping the taskId alias | it is already documented and matches adjacent task envelope helpers. Confidence: high Scope-risk: narrow Directive: Keep schema validation aligned with handler-level aliases for MCP tools. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts --runInBand Tested: npm run build Tested: git diff --check Not-tested: Full CI matrix before push. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T00:49:35Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Resolve PR #1082's stacked-base conflicts by preserving task ownership and secret-redaction behavior from feat/855 while keeping browser-task envelopes, budget recording, and the oc_task_get taskId alias fix. Constraint: GitHub marked PR #1082 as CONFLICTING against feat/855-task-ledger after review fixes were pushed. Rejected: Removing taskId alias or browser_task envelopes | both are documented PR behavior and covered by tests. Confidence: high Scope-risk: moderate Directive: Keep task envelope budget recording and task ownership checks composed when the task ledger evolves. Tested: npm ci Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand Tested: npm run build Tested: git diff --check Not-tested: Full remote CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3856455599

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T01:07:52Z

+  const meta = store.readMetaSync(taskId);
+  if (!meta) return errorResult(`oc_task_update: unknown task ${taskId}`);
+  if (meta.kind !== 'browser_task') return errorResult('oc_task_update: only host-driven browser_task envelopes can be updated');
+  const updated = await store.update(taskId, (cur) => {


Enforce task ownership before oc_task_update mutations

oc_task_update never checks whether the caller can access the target task, unlike oc_task_get/cancel/wait, so any session that learns a valid task_id can change another session's browser_task phase/notes. Because _sessionId is ignored and the handler proceeds directly from readMetaSync to store.update, this is a cross-session integrity regression for task state.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T01:07:52Z

+  const meta = store.readMetaSync(taskId);
+  if (!meta) return errorResult(`oc_task_finish: unknown task ${taskId}`);
+  if (meta.kind !== 'browser_task') return errorResult('oc_task_finish: only host-driven browser_task envelopes can be finished directly');
+  const endedAt = Date.now();
+  const updated = await store.update(taskId, (cur) => {


Enforce task ownership before oc_task_finish transitions

oc_task_finish also skips the session/tenant visibility check that other task-ledger tools apply, so a caller with another task's id can force it to COMPLETED/FAILED/CANCELLED. This lets one session terminate another session's running envelope and can corrupt host control flow that relies on task isolation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T01:07:52Z

+  if (meta.status === 'COMPLETED' || meta.status === 'FAILED' || meta.status === 'CANCELLED') return;
+  try {
+    await store.update(taskId, (cur) => applyToolCallToTask(cur, call));


Re-check terminal status inside task envelope update lock

The terminal-state guard is done only before acquiring the TaskStore.update lock, then the mutator unconditionally applies applyToolCallToTask. If a task is finished/cancelled between the pre-read and lock acquisition, this path still mutates counters/recent events on a terminal row, violating the ledger's terminal immutability contract and producing post-terminal activity.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T01:09:49Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Gate oc_task_update and oc_task_finish with the same session/tenant visibility checks as get/list/wait/cancel, and re-check terminal status inside the task envelope update lock before appending budget events. Constraint: Latest Codex review on PR #1082 flagged unauthorised task mutation as P1 and a terminal-state race as P2. Rejected: Relying on prior readMeta visibility | the mutating tools need explicit access checks before state changes. Confidence: high Scope-risk: narrow Directive: Any new task-ledger mutator must call canAccessTask before modifying a task. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand Tested: npm run build Tested: git diff --check Not-tested: Full remote CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T01:29:11Z

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 627829e915

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T01:34:51Z

+  const meta = store.readMetaSync(taskId);
+  if (!meta) return;


Enforce owner checks before recording envelope tool calls

recordTaskToolCall updates any task whose 16-hex id appears in tool args, but it never verifies that the caller owns that task. Because MCPServer invokes this path for every non-oc_task_* tool call, a different session/tenant that knows a valid task_id can inject tool-call events and mutate another task’s counters/budget state (for example, driving budget_exceeded on someone else’s envelope). Add the same canAccessTask ownership gate used by task mutators before applying applyToolCallToTask/appending events.

Useful? React with 👍 / 👎.

Propagate caller ownership context into task envelope tool-call recording and skip counter/event mutations when the task owner does not match the calling session or tenant. Constraint: Latest Codex review on PR #1082 flagged cross-session budget/event injection through recordTaskToolCall as P1. Rejected: Checking only task_id shape | a valid leaked id is sufficient to corrupt another envelope without owner comparison. Confidence: high Scope-risk: narrow Directive: Task envelope side effects must verify owner before mutating meta or appending events. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand Tested: npm run build Tested: git diff --check Not-tested: Full remote CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>

Resolve the latest feat/855 base drift while preserving browser_task envelope start semantics and the oc_task_get taskId alias schema fix. Constraint: GitHub marked PR #1082 dirty after feat/855 advanced with stricter task tool schema descriptions. Rejected: Restoring task_id as schema-required | that would make the documented taskId alias unreachable again. Confidence: high Scope-risk: moderate Directive: Keep task tool schemas aligned with handler aliases when merging the stacked base. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand Tested: npm run build Tested: git diff --check Not-tested: Full remote CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f05ee874

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T02:16:58Z

+  registerOcTaskUpdateTool(server);
+  registerOcTaskFinishTool(server);


Block task-ledger update/finish from async self-scheduling

oc_task_update and oc_task_finish are now registered as normal tools, but oc_task_start's denylist still only blocks the older oc_task_* entries. That means callers can enqueue these ledger-management tools via oc_task_start(kind=...), which bypasses the intended “no task-ledger tool scheduling” guard and allows asynchronous envelope mutation paths that the guard was meant to prevent. Add the new tool names to the denylist (or switch to an explicit allowlist for schedulable kinds).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T02:16:59Z

+    if (kind === 'browser_task') {
+      store.appendEvent(taskId, { ts: createdAt, kind: 'started', data: { objective, phase } });
+      return {
+        content: [{ type: 'text', text: `task_id=${taskId} status=RUNNING kind=browser_task phase=${phase}` }],


Make browser_task cancellation reach a terminal state

browser_task envelopes are created directly in RUNNING and return early without spawning runTask. oc_task_cancel only turns RUNNING tasks into cancel_requested_at and relies on the runner to finalize cancellation, so these envelopes can remain stuck in RUNNING forever after cancel requests. This affects the new default path (omitted kind) and can leave oc_task_wait/polling clients blocked until a separate oc_task_finish call is made.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f05ee874

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T02:23:08Z

+  if (navUrl) {
+    checkLimit('maxSameUrlNavigations', counters.sameUrlNavigations[navUrl] ?? 0, policy.maxSameUrlNavigations, exceeded, warnings);
+  }


Keep same-URL budget exceeded after non-navigate calls

evaluateBudget only checks maxSameUrlNavigations when the current tool call has a navUrl, so once a URL has already exceeded its cap, any subsequent non-navigate call skips that check and can downgrade budget_status back to ok/warning even though counters.sameUrlNavigations is still over limit. This causes oc_task_get to hide a previously exceeded wandering constraint until the same URL is navigated again.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T02:23:08Z

+  registerOcTaskUpdateTool(server);
+  registerOcTaskFinishTool(server);


Block new task-ledger tools from async scheduling

This commit registers oc_task_update and oc_task_finish as normal tools, but the oc_task_start denylist still only blocks the older task-ledger set. As a result, oc_task_start can now enqueue these ledger-mutating tools as background jobs, which bypasses the existing “refuse task-ledger scheduling” guard and creates nested ledger tasks instead of direct envelope updates/finishes.

Useful? React with 👍 / 👎.

Make oc_task_cancel close browser_task envelopes as CANCELLED immediately, because those host-managed tasks do not have a background runner that can observe cancel_requested_at and finish the transition. Constraint: browser_task is the default task-envelope path and can otherwise stay RUNNING forever after cancellation. Rejected: Requiring clients to call oc_task_finish after oc_task_cancel | That leaves wait/poll clients blocked and weakens the cancel contract. Confidence: high Scope-risk: narrow Directive: Runner-backed tasks may remain cooperative; host-driven browser_task envelopes must terminalize inside ledger verbs. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand; npm run build; git diff --check Not-tested: CI matrix pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

Evaluate same-URL navigation counts across the accumulated counter map on every recorded tool call, so later non-navigation calls cannot downgrade a previously exceeded wandering budget. Constraint: Codex review flagged budget_status regression after maxSameUrlNavigations is exceeded. Rejected: Checking only the current navigate URL | That makes budget state depend on the most recent tool type instead of cumulative task state. Confidence: high Scope-risk: narrow Directive: Budget decisions should be derived from full task counters, not only fields touched by the latest call. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand; npm run build; git diff --check Not-tested: CI matrix pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 347d250149

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T02:32:27Z

+  const taskId = String(params.task_id ?? params.taskId ?? '');
  if (!taskId) {
    return { isError: true, content: [{ type: 'text', text: 'oc_task_get: task_id is required' }] };


Restore schema-level task_id validation for oc_task_get

task_id/taskId validation now happens only inside the handler, so malformed calls reach handleToolsCall's session-initialization path before being rejected. In practice, an oc_task_get call without an id can still trigger expensive session setup (and potentially auto-launch Chrome) before returning oc_task_get: task_id is required, which is a regression from the previous early rejection behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T02:32:27Z

+const handler: ToolHandler = async (sessionId, params, context): Promise<MCPResult> => {
+  const taskId = taskIdFrom(params);
+  if (!taskId) return errorResult('oc_task_update: task_id is required');
+  const phase = params.phase !== undefined ? normalizeTaskPhase(params.phase) : undefined;


Reject invalid phase values instead of coercing to explore

This coercion silently maps any unrecognized phase input to explore; because tool-call argument enforcement only guarantees required fields, a typo like "verfiy" will overwrite the task's current phase rather than erroring. That makes state transitions non-deterministic for callers and can incorrectly reset host workflow state.

Useful? React with 👍 / 👎.

Block oc_task_update and oc_task_finish from being scheduled through oc_task_start, and reject invalid oc_task_update phases instead of silently coercing them to explore. Constraint: Codex review flagged task-ledger control tools as unsafe async targets and invalid phase coercion as misleading state mutation. Rejected: Relying on handler failures after scheduling | Control tools should never enter the background ledger as work items. Confidence: high Scope-risk: narrow Directive: Task-ledger verbs are control-plane operations; keep them synchronous and validate host-declared envelope state explicitly. Tested: npm test -- --runTestsByPath tests/core/task-ledger/tools.test.ts tests/core/task-ledger/budget.test.ts --runInBand; npm run build; git diff --check Not-tested: CI matrix pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T03:14:51Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

`oc_task_*` tools are pure ledger operations or persist meta rows before background work begins — they never touch the browser themselves. Adding them to SKIP_SESSION_INIT_TOOLS prevents malformed task-id input from triggering Chrome auto-launch via the session-init path, addressing the regression noted by Codex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a7f3923e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T09:31:04Z

+  checkLimit('maxToolCalls', counters.toolCalls, policy.maxToolCalls, exceeded, warnings);
+  checkLimit('maxConsecutiveSameTool', counters.consecutiveSameTool, policy.maxConsecutiveSameTool, exceeded, warnings);
+  checkLimit('maxObservationStreak', counters.observationStreak, policy.maxObservationStreak, exceeded, warnings);
+  checkLimit('maxFailureStreak', counters.failureStreak, policy.maxFailureStreak, exceeded, warnings);
+  checkSameUrlNavigationLimit(counters.sameUrlNavigations, policy.maxSameUrlNavigations, exceeded, warnings);
+  if (policy.maxWallMs) {
+    checkLimit('maxWallMs', Date.now() - meta.created_at, policy.maxWallMs, exceeded, warnings);


Enforce all declared task policy constraints

normalizeTaskPolicy stores allowedDomains and checkpointEveryCalls, and oc_task_start advertises both as part of the deterministic policy, but evaluateBudget never consults either field. As a result, a host can set those limits and still get budget_status: "ok" while navigating outside the allowed domain set or missing checkpoint cadence, which silently disables part of the policy contract for browser-task envelopes.

Useful? React with 👍 / 👎.

# Conflicts: # src/mcp-server.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d285e2844e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:02:45Z

+      status: kind === 'browser_task' ? 'RUNNING' : 'PENDING',
      pid: process.pid,
      created_at: createdAt,


Exclude browser_task from orphan PID failure path

oc_task_start now creates browser_task rows directly in RUNNING with pid=process.pid, but these tasks do not spawn a runner process. On the next server restart, startup orphan reaping (TaskStore.reapOrphans) treats that stale PID as dead and force-transitions every still-open browser_task to FAILED with orphaned, even if the host intended to continue/update/finish it. This creates deterministic data loss for any host-driven envelope that spans a process restart.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:02:45Z

    meta,
+    counters: meta.counters,
+    budget_status: meta.budget_status,
+    budget_exceeded: meta.budget_exceeded,


Recompute maxWallMs budget when serving oc_task_get

oc_task_get returns the stored budget_status/budget_exceeded verbatim, but wall-clock budget evaluation is only performed when a tool call is recorded (applyToolCallToTask in budget.ts). If a task sits idle past policy.maxWallMs, polling oc_task_get will still report the old status until another tool call happens, so wall-time overruns are silently hidden during the exact polling workflow this API is meant to support.

Useful? React with 👍 / 👎.

shaun0927 · 2026-05-13T15:14:10Z

Deferred from this review pass: at 8,355 additions / 706 deletions across 106 files, this PR is too large to safely batch-review and merge in a single session. The stack also bottoms out here (no parent PR for the task-ledger base). Recommend opening a dedicated review pass for the task-envelope budget changes, validated against the harness/task-ledger tests, before merging. (No Codex P0/P1/P2 outstanding per current snapshot.)

Constraint: PR #1082 had no current GitHub status checks after review fixes, while merge readiness requires CI evidence. Rejected: Merging without a fresh branch check | User requested CI checks to succeed before merge. Confidence: high Scope-risk: narrow Directive: Keep this branch on a fresh CI run before merge. Tested: npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts --runInBand; npm run build Not-tested: Full CI matrix pending on GitHub

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1db641b1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T17:29:25Z

+  const store = getTaskStore();
+  const meta = store.readMetaSync(taskId);
+  if (!meta) return errorResult(`oc_task_finish: unknown task ${taskId}`);


Wait for startup orphan reap before finishing envelopes

oc_task_finish reads and mutates task rows immediately, unlike oc_task_get/list/cancel/wait which block on waitForTaskStartupReap(). On server startup, this lets a caller finish a stale RUNNING row from a previous process before orphan reaping marks it FAILED, breaking the startup invariant that orphan state is resolved first. The same race exists in oc_task_update, so both host-driven mutators should wait for startup reaping before reading meta.

Useful? React with 👍 / 👎.

Retargeting the stacked PR to develop changed the actual merge gate from a zero-check branch comparison to the repository CI matrix, so this empty commit exists only to request that signal. Constraint: PR #1082 was retargeted from an already-merged stacked base to develop and needed a fresh GitHub CI run. Rejected: Merging without develop-base CI | the PR had zero checks after retargeting. Confidence: high Scope-risk: narrow Directive: No code behavior change; use CI result as the merge gate. Tested: No code changes in this commit; prior validation covered task envelope budget tests and build. Not-tested: Full CI pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

Retargeting the task-envelope PR to develop exposed the current schema-lint contract for required fields, so the host-driven finish outcome now carries the required prefix without changing behavior. Constraint: CI failed lint:tool-schemas on oc_task_finish.outcome required_prefix after develop-base validation started. Rejected: Baselining the new violation | the field is genuinely required and can satisfy the existing schema contract directly. Confidence: high Scope-risk: narrow Directive: Keep task envelope schema descriptions aligned with required-field lint before merge. Tested: npm run build; npm run lint:tool-schemas; npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts --runInBand Not-tested: Full CI pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a71c663b57

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T17:48:49Z

+const OBSERVATION_TOOLS = new Set([
+  'read_page',
+  'find',
+  'tabs_context',
+  'page_screenshot',
+]);


Expand observation tool whitelist for budget streaks

The observation budget is currently blind to several read-only browser tools (query_dom, inspect, page_content, wait_for) because only four names are treated as observations here. In a loop that repeatedly uses one of those omitted tools, observationStreak never increases and maxObservationStreak will not trigger, so wandering-detection can stay ok even when the agent is stuck gathering observations. This also misclassifies those calls as actions, skewing the envelope counters returned by oc_task_get.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T17:48:49Z

      // CDPClient may not be initialized — proceed with normal flow
    }

+    const taskEnvelopeId = toolName.startsWith('oc_task_') ? undefined : extractTaskId(toolArgs);


Exclude non-browser tools from envelope budget accounting

Task-envelope recording is enabled for every tool except oc_task_*, so any call that carries taskId (including control/polling tools like oc_run_status, oc_progress_status, or oc_journal) increments task budgets as if it were browser work. In environments where callers attach taskId broadly, repeated status polling can trip maxToolCalls/maxConsecutiveSameTool without any browser interaction, producing false budget_exceeded states and misleading recommendations.

Useful? React with 👍 / 👎.

The task-envelope PR now carries develop's task evidence digest contract so retargeting to develop keeps includeDigest/include_digest behavior while preserving budget envelope changes. Constraint: #1082 became dirty after #1146 merged task digest support into develop. Rejected: Keeping the reserved digest placeholder | it would regress the newly merged oc_task_get digest API. Confidence: high Scope-risk: moderate Directive: When stacked task-ledger PRs are retargeted, preserve newly merged aliases and evidence surfaces. Tested: npm run build; npm run lint:tool-schemas; npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts tests/core/task-ledger/digest.test.ts --runInBand Not-tested: Full CI pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af6dd95dcb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T17:57:22Z

  ): Promise<MCPResult> => {
-    const kind = String(params.kind ?? '');
+    const rawKind = params.kind;
+    const kind = typeof rawKind === 'string' && rawKind.length > 0 ? rawKind : 'browser_task';


Reject invalid kind values instead of defaulting to browser_task

This fallback makes malformed oc_task_start calls create real browser_task rows instead of failing fast: when kind is present but invalid (e.g. typoed key leading to undefined, empty string, number, or object), the handler coerces it to 'browser_task' and returns a RUNNING task. That is a behavior regression from the previous validation path (invalid kinds were rejected as unregistered), and it can silently spawn unintended envelopes that users must later clean up or cancel.

Useful? React with 👍 / 👎.

The task envelope PR adds host-driven finish/update tools, so the registered default tool list, capability map, and schema lint contract need to agree before CI can validate the branch. Constraint: CI failed capability-filter because oc_task_finish and oc_task_update were registered but absent from TOOL_CAPABILITY_MAP and the v1.11 tool snapshot. Rejected: Hiding the tools from default registration | they are part of the task-envelope host contract being introduced. Confidence: high Scope-risk: narrow Directive: Every newly registered tool must be added to both the capability map and the default surface snapshot in the same PR. Tested: npm test -- --runTestsByPath tests/capability-filter.test.ts --runInBand; npm run build; npm run lint:tool-schemas Not-tested: Full CI pending after push Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7a4dac47c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T18:07:10Z

+        principalMode: principal?.mode,
+        args: toolArgs,
+        durationMs: Date.now() - toolStartTime,
+        ok: false,


Preserve recoverable timeout outcomes in task envelope logs

This catch-path call hardcodes ok: false, which misclassifies recoverable timeout tools as failures in envelope budgets. In handleToolsCall, errorIsError allows timeout-recoverable tools (currently navigate) to return isError: false, but this line still increments failureCalls/failureStreak and can trigger maxFailureStreak for calls the client saw as non-errors. Use the same outcome signal used to build errResult.isError when recording envelope events.

Useful? React with 👍 / 👎.

Constraint: Host-driven task envelopes must only account browser work and must preserve recoverable outcomes and startup ownership guards. Rejected: Recording every task_id-bearing tool call | non-browser utility calls can accidentally mutate browser budget counters. Confidence: high Scope-risk: moderate Directive: Keep task envelope accounting limited to browser-facing tools and wait for startup orphan reaping before terminal mutations. Tested: npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts tests/core/task-ledger/digest.test.ts --runInBand; npm run build; npm run lint:tool-schemas Not-tested: full CI matrix pending Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81bd12601d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T18:22:37Z

+  if (meta.status === 'COMPLETED' || meta.status === 'FAILED' || meta.status === 'CANCELLED') return;
+  try {
+    const updated = await store.update(taskId, (cur) => {
+      if (cur.status === 'COMPLETED' || cur.status === 'FAILED' || cur.status === 'CANCELLED') return undefined;
+      return applyToolCallToTask(cur, call);


Restrict envelope recording to browser_task rows

recordTaskToolCall currently updates any non-terminal task that passes ownership checks, without verifying meta.kind. If a caller includes a running non-envelope task id (for example, a crawl task) in taskId/task_id on a browser tool call, this path will still mutate that task’s counters, budget_status, and recent_events, which pollutes async task state and can surface misleading budget guidance in oc_task_get. Add a browser_task kind guard (including inside the update closure) before calling applyToolCallToTask.

Useful? React with 👍 / 👎.

Constraint: Task envelope budget counters are for host-driven browser_task rows, not arbitrary async task records. Rejected: Updating any ledger row whose id appears in tool args | long-running non-browser jobs can be miscounted as browser budget activity. Confidence: high Scope-risk: narrow Directive: Keep browser-task envelope accounting separated from background task execution records. Tested: npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts tests/core/task-ledger/digest.test.ts --runInBand; npm run build; npm run lint:tool-schemas Not-tested: full CI matrix pending Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4931234739

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T18:32:18Z

+      return errorResult('oc_task_start: kind must be a non-empty string when provided');
    }
+    const kind = typeof rawKind === 'string' ? rawKind.trim() : 'browser_task';
+    const args = (params.args ?? {}) as Record<string, unknown>;


Reject missing args when scheduling non-browser tasks

oc_task_start now defaults params.args to {} for every kind, so a call like oc_task_start({"kind":"crawl"}) is accepted and persists a task row instead of failing fast on invalid input. Because handleToolsCall only enforces top-level required fields, the real argument error is deferred to the background run, which turns into an asynchronous FAILED task and leaves noisy ledger state for what should be a synchronous validation error. This regression affects callers that omit args (or send a non-object) when launching legacy background kinds.

Useful? React with 👍 / 👎.

Constraint: Codex review found oc_task_start could schedule non-browser tools with missing args by silently defaulting to an empty object. Rejected: Treating missing args as equivalent to an explicit empty object for tool tasks | callers must intentionally provide the argument envelope for asynchronous tool invocation. Confidence: high Scope-risk: narrow Directive: Keep omitted args valid only for host-managed browser_task envelopes, not inner tool scheduling. Tested: npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts tests/auth/scope-policy.test.ts tests/core/task-ledger/digest.test.ts --runInBand; npm run build; npm run lint:tool-schemas; git diff --check Not-tested: Live MCP async scheduling of a production tool beyond unit handler coverage. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T18:43:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Task envelopes advertised allowed domain and checkpoint cadence policies, but only some budget fields affected evaluation. Enforce URL scope as an exceeded budget and surface checkpoint cadence as a warning so hosts can rely on every declared policy knob.\n\nConstraint: #1034 task envelopes must be deterministic and budget decisions must match their declared policy contract.\nRejected: Documenting allowedDomains/checkpointEveryCalls as advisory-only | the schema exposes them as policy constraints and Codex review flagged the gap.\nConfidence: high\nScope-risk: narrow\nDirective: When adding task policy fields, update both normalization and budget evaluation/tests together.\nTested: npm test -- --runTestsByPath tests/core/task-ledger/budget.test.ts tests/core/task-ledger/tools.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; npm run lint:tool-schemas; git diff --check\nNot-tested: Full GitHub Actions matrix pending after push.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T19:58:55Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Merged develop into the task envelope branch while preserving declared task-envelope ledger recording and current run-harness telemetry wiring. Constraint: PR #1082 became dirty after develop advanced through validated merges. Rejected: Dropping task-envelope audit recording | that is the branch's core safety contract. Confidence: high Scope-risk: moderate Directive: Keep task envelope policy checks and tool-call ledger recording aligned with registered tool surfaces. Tested: npm test -- --runTestsByPath tests/run-harness/tools-budget.test.ts tests/capability-filter.test.ts tests/unit/tool-annotations.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; npm run lint:tool-schemas; node scripts/lint-tools-capabilities.js; git diff --check Not-tested: Full repository test suite. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T20:49:28Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

shaun0927 · 2026-05-13T22:49:51Z

Reviewed and validated this PR for merge.

This PR implements the bounded #1034 task-envelope slice on top of the task ledger: host-driven browser_task envelopes, taskId/task_id tool-call accounting, deterministic wandering-budget counters, and oc_task_update / oc_task_finish lifecycle controls scoped to browser-task envelopes. The scope is aligned with the repository direction because it adds deterministic harness facts and budget accounting without introducing autonomous planning, server-side LLM calls, or new Chrome process policy.

I reviewed the Codex bot history. Earlier Codex reviews reported no major issues, and the later bot messages are usage-limit notices rather than actionable code feedback. The PR is now cleanly mergeable and the complete CI matrix is green across Ubuntu, macOS, and Windows for Node 18/20/22. The PR also documents its live-browser verification gap instead of overclaiming it. Given the bounded harness scope, explicit non-goals, passing matrix, and mergeable state, this is safe to merge.

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 and others added 2 commits May 13, 2026 11:10

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 and others added 2 commits May 13, 2026 11:25

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 mentioned this pull request May 13, 2026

feat(harness): task envelope budgets to bound browser-agent wandering #1034

Closed

21 tasks

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Merge develop into feat/1034-task-envelope

d285e28

# Conflicts: # src/mcp-server.ts

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 changed the base branch from feat/855-task-ledger to develop May 13, 2026 17:35

shaun0927 and others added 2 commits May 14, 2026 02:39

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 merged commit 90ffef3 into develop May 13, 2026
9 checks passed

shaun0927 deleted the feat/1034-task-envelope branch May 14, 2026 13:21

		registerOcTaskUpdateTool(server);
		registerOcTaskFinishTool(server);

Conversation

shaun0927 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress / Review status

Summary

Direction / duplication review before implementation

Validation

Live verification status

Notes for reviewers

Uh oh!

gemini-code-assist Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

shaun0927 commented May 12, 2026 •

edited

Loading