feat(task-run): persist goal-level lifecycle state (#1039) by shaun0927 · Pull Request #1083 · shaun0927/openchrome

shaun0927 · 2026-05-12T15:15:06Z

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field	Value
Branch	`feat/1039-task-run-lifecycle` → `develop`
Draft	no
CI	❌ 8/9 passing — 1 failing
Mergeable	❌ CONFLICTING
Review decision	—
Codex (latest)	💡 suggestions posted
Other reviewers (latest)	chatgpt-codex-connector: commented
Head	`4ba6bf6` — fix(task-run): lock concurrent mutations, reject NEEDS_HELP via updat…
Commits	3

_{Owner comment cleanup: 3 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.}

Summary

Closes #1039.

Adds an opt-in goal-level TaskRun lifecycle layer that stores user-goal progress outside browser/session state:

src/core/task-run/* file-backed TaskRun store under OPENCHROME_HOME/task-runs or ~/.openchrome/task-runs
seven MCP tools: oc_task_run_start, oc_task_run_update, oc_task_run_checkpoint, oc_task_run_needs_help, oc_task_run_complete, oc_task_run_get, oc_task_run_list
redacted metadata/event persistence, terminal-state immutability, NEEDS_HELP resume guard, bounded item arrays, and checkpoint files

Fit / non-duplication review

Does not duplicate feat(core): oc_task_ledger — persistent async task table with cancel & wait (mcp-browser-use adoption A) #855 / PR feat(core): oc_task_ledger — persistent async task table with cancel & wait (#855) #911: this PR does not schedule or execute async background work; it only tracks goal-level state and optional ledger ids.
Does not duplicate feat(core): MCP progress notifications for long-running tools (mcp-browser-use adoption D) #863 / PR feat(core): MCP progress notifications for long-running tools (#863) #940: this PR emits no MCP progress notifications.
Does not duplicate feat(dashboard): task & skill ledger views (mcp-browser-use adoption F) #865 / PR feat(dashboard): task & skill ledger views (#865) #955: this PR adds no dashboard UI.
Keeps OpenChrome MCP/CDP-first: no Postgres, no noVNC, no Bytebot desktop agent, no new native dependencies.

Validation

npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBand
npm run build
npm run lint:changed

Notes

Live MCP round-trip against a running Chrome daemon was not performed in this branch; the tool surface and persistence behavior are covered by unit/registration tests.

gemini-code-assist · 2026-05-12T15:15:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

qodo-code-review · 2026-05-12T15:15:47Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

chatgpt-codex-connector · 2026-05-13T00:45:33Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chatgpt-codex-connector · 2026-05-13T00:57:47Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eae252c83b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T01:51:42Z

+    const eventsPath = this.eventsPath(runId);
+    if (!fs.existsSync(eventsPath)) return [];
+    const text = await fs.promises.readFile(eventsPath, 'utf8');
+    return text.split('\n').filter(Boolean).map(line => JSON.parse(line) as TaskRunEvent);


Tolerate malformed event lines when reading TaskRun logs

Guard readEvents against malformed JSONL entries instead of failing the whole request. As written, a single partial/corrupt line in events.jsonl (for example after an interrupted write or manual file damage) makes JSON.parse throw and causes oc_task_run_get with include_events=true to return an error even when the TaskRun metadata is otherwise valid. This turns a non-critical log corruption into a hard read failure for operators.

Useful? React with 👍 / 👎.

Introduce an opt-in TaskRun store and MCP tool surface so long-running user goals can persist progress, help state, checkpoints, and evidence without changing existing browser tool behavior. Constraint: Builds on issue #1039 while avoiding duplication of open task-ledger PR #911. Rejected: Requiring Postgres or a Bytebot-style desktop task service | conflicts with OpenChrome's local MCP/CDP-first design. Confidence: high Scope-risk: moderate Directive: Keep TaskRun as goal-level metadata; do not move browser execution or #855 async task scheduling into this layer. Tested: npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBand; npm run build; npm run lint:changed Not-tested: Live MCP round-trip against a running Chrome daemon. Co-authored-by: OmX <omx@oh-my-codex.dev>

Constraint: CI enforces REQUIRED prefixes and pipes the full tools manifest through lint-tool-schemas. Rejected: Updating the schema lint baseline | new TaskRun tools can satisfy the existing contract directly. Confidence: high Scope-risk: narrow Directive: New required tool fields must include REQUIRED descriptions and introspection output must drain before exit. Tested: npm test -- --runInBand tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts; npm run build; npm run lint:tool-schemas Not-tested: Full GitHub CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>

…e, accumulate truncation counts, tolerate malformed JSONL, fix schema and scope-policy - P1: Wrap update/checkpoint/needsHelp/complete with per-run acquireLock so concurrent calls on the same run_id serialize rather than overwriting each other (read-modify-write race). - P2: Reject status=NEEDS_HELP via update(); callers must use oc_task_run_needs_help which guarantees reason+requested_at payload. - P2: Make completed_items_truncated / failed_items_truncated cumulative across multiple overflow calls instead of resetting to latest overflow only. - P2: Remove PENDING from oc_task_run_update schema enum (runtime already rejects it; schema now matches behaviour). - P2: Add oc_task_run_get + oc_task_run_list to READ_TOOLS in scope-policy so read-only API keys can query TaskRun state. - P2: Tolerate malformed JSONL lines in readEvents(); skip and log instead of throwing for the entire call. - Tests: concurrent serialization, NEEDS_HELP rejection, cumulative truncation, malformed JSONL tolerance, scope-policy read classification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ba6bf6898

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T09:39:49Z

+const listHandler: ToolHandler = async (_sessionId, args) => {
+  try {
+    const task_runs = await store.list(args as TaskRunListFilter);


Scope TaskRun listing to caller-owned data

oc_task_run_list ignores the caller session (_sessionId) and directly returns store.list(...), which scans the shared on-disk task-run directory. In API-key multi-tenant deployments, mcp-server only enforces tenant ownership for session IDs, not arbitrary task-run IDs, so a read-scoped tenant can enumerate other tenants’ run metadata (goal/progress/run_id) from the same daemon process.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T09:39:49Z

+    const runId = String(args.run_id || '');
+    const meta = await store.update(runId, args as UpdateTaskRunInput);
+    return jsonResult({ task_run: meta });


Enforce ownership before TaskRun mutation

Mutating handlers update state using only run_id and never check whether the run belongs to the calling session/tenant. In a shared server, any write-capable tenant that learns a foreign run_id (for example via the global list endpoint) can change another tenant’s TaskRun status/progress, causing cross-tenant state corruption.

Useful? React with 👍 / 👎.

# Conflicts: # src/tools/index.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74f85b01b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T10:48:40Z

+        ...current,
+        status: nextStatus,
+        workflow_id: optionalString(input.workflow_id) || current.workflow_id,
+        ledger_task_ids: uniqueStrings([...(current.ledger_task_ids || []), ...(input.ledger_task_ids || [])]),


Normalize ledger_task_ids before merging updates

TaskRunStore.update() builds ledger_task_ids with ...(input.ledger_task_ids || []), but MCP argument validation only enforces required fields (no type check), so a caller can pass a non-array value. A string value is iterable and gets split into single characters ("abc" -> ["a","b","c"]), silently corrupting stored linkage metadata; non-iterables throw and fail the call. Please sanitize input.ledger_task_ids via uniqueStrings directly before spreading/merging.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:48:40Z

+    await fs.promises.mkdir(this.runDir(runId), { recursive: true });
+    const release = await acquireLock(this.lockPath(runId));


Prevent orphan lock dirs on unknown run IDs

withRunLock() creates runDir and meta.lock before verifying the TaskRun exists, while callers can supply arbitrary 16-hex run_ids. Repeated typo/malicious update/checkpoint/needs_help/complete calls therefore create permanent empty directories for nonexistent runs, causing unbounded filesystem growth under task-runs and extra work on future list() scans. Existence should be checked (or cleanup performed) before persisting lock artifacts for unknown IDs.

Useful? React with 👍 / 👎.

PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR merge commit now drags src/tools/task-run.ts onto this branch and tsc fails with TS2741 on 7 tools missing the required annotations field. Add annotations to each of the 7 oc_task_run_* tool definitions and catalog them in TOOL_ANNOTATIONS: - start/update/checkpoint/needs_help → MUTATES - complete → DESTRUCTIVE (terminal) - get/list → READ_ONLY

* feat(core): ToolAnnotations on every tool (#867) Adds the MCP-spec ToolAnnotations field (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) to every tool exposed by the MCP server. Pure metadata — zero behavior change, no new dependencies, no native modules, no outbound traffic. Compliant with Portability-Harness Contract P1/P2/P3/P4/P5. Approach (hybrid): - src/types/tool-annotations.ts (NEW) — canonical TOOL_ANNOTATIONS table, single source of truth. - MCPToolDefinition.annotations is a REQUIRED field — TypeScript enforces presence at compile time for every future tool. - Each tool file declares `annotations: TOOL_ANNOTATIONS.<name>` (one import + one reference line) rather than duplicating literal values. 70 definitions across 58 tool files + 2 pilot/handoff tools + the inline expand_tools virtual tool. Worst-case rule (consistent across the table): annotations describe the most destructive / least pure behavior across all valid inputs. javascript_tool, batch_execute, act, network, and request_intercept are all flagged destructive + open-world because their worst-case inputs (arbitrary JS, batch dispatch, NL action routing, network emulation/blocking) can mutate or block observable state. Verification: - npm run build, npm run lint, npm test pass. - tests/unit/tool-annotations.test.ts (NEW, 9 cases) covers shape, destructive/read-only/open-world expectations, requireAnnotations, source-level injection check, and reverse-direction orphan check. - Only pre-existing flaky stress timing test fails under parallel load; passes in isolation — unrelated to this change. docs/mcp/tool-annotations.md mirrors the runtime table for human review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(annotations): reclassify validate_page, batch_paginate, console_capture (codex P1/P2 from #945) Codex flagged three tools incorrectly mapped to READ_ONLY: - validate_page navigates via smartGoto, can create a tab, waits on live page state — open-world, mutating, non-idempotent. - batch_paginate presses keys, clicks, scrolls; in 'url' mode creates tabs and navigates to generated URLs — mutating, non-idempotent, potentially open-world. - console_capture supports start/stop/clear; clear deletes buffered logs — destructive, non-idempotent. Updates docs/mcp/tool-annotations.md mirror to match. Addresses Codex review feedback on PR #945. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Allow manifest entries to carry annotations Update the worker tool manifest type so tests and generated manifests can include MCP ToolAnnotations added by the tool-annotation PR. Constraint: Preserve existing manifest filtering behavior and make annotations optional for older manifests. Rejected: Removing annotations from manifest tests | workers should be able to receive annotation metadata when present. Confidence: high Scope-risk: narrow Directive: Keep tool manifest typing aligned with MCPToolDefinition when new shared fields are surfaced. Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts Not-tested: Full GitHub Actions matrix before push * Annotate develop-era tools for MCP safety hints Constraint: PR #945 now requires every MCP tool definition, including tools merged after the original branch, to carry canonical annotations.\nRejected: Making annotations optional for newer tools | that would weaken the issue #867 contract and hide future omissions.\nConfidence: high\nScope-risk: narrow\nDirective: Add TOOL_ANNOTATIONS and docs rows whenever a new tool definition is registered.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts\nNot-tested: full CI matrix * Align Cursor tier-one count with current tool surface Constraint: CI runs the Cursor cross-env suite with OPENCHROME_RUN_CROSS_ENV=1, where develop now exposes 46 non-expand tier-one tools.\nRejected: Reverting new tier-one registrations | those came from current develop and are outside PR #945's scope.\nConfidence: high\nScope-risk: narrow\nDirective: Update this count when intentional tier-one tool exposure changes.\nTested: npx jest --config jest.config.js --runInBand tests/cross-env/cursor-verification.test.ts (skipped locally without cross-env flag)\nNot-tested: full CI matrix * Stabilize CI baselines for current tool surface Constraint: This branch inherits develop's 45-tool tier-one surface and Windows CI checks out fixtures with CRLF.\nRejected: Reverting develop tool registrations | outside PR #945 scope and would reintroduce merge drift.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Cursor tier-one counts branch-specific and normalize fixture newlines for generated JSON.\nTested: npx jest --config jest.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts\nNot-tested: full CI matrix * Harden admin key stdout assertion against worker noise Constraint: Windows CI can interleave unrelated Jest worker output into the in-process stdout hook.\nRejected: Requiring stdout to contain only one physical line | that assertion flakes when shared process output is polluted outside the CLI path.\nConfidence: high\nScope-risk: narrow\nDirective: Assert the secret token appears exactly once and never in stderr; do not depend on unrelated captured stdout being empty.\nTested: npx jest --config jest.config.js --runInBand tests/cli/admin-keys.test.ts\nNot-tested: full CI matrix * fix(annotations): reclassify execute_plan, workflow_init, memory Codex P1/P2 findings on PR #945: - execute_plan: open-world AND destructive (resolves arbitrary tool names; worst-case dispatch set includes destructive tools + non-loopback nav) - workflow_init: open-world (dnsResolve + worker page.goto to non-loopback) - memory: destructive (validate-prune path deletes memory entries) Mirrors the docs table in docs/mcp/tool-annotations.md. * feat(annotations): annotate develop-era tools merged after this PR PR #945 enforces ToolAnnotations on every MCPToolDefinition by typing `annotations` as required, but the tool surface on develop has grown by 15 entries since the PR was branched — crawl_*, oc_reflect, oc_task_*, oc_skill_replay, oc_run_*, and the dynamic-skill synthesizer. The tsc build accordingly fails on all 9 OS/Node combos with TS2741. Add catalog entries to TOOL_ANNOTATIONS and wire each new tool's definition to its annotation: - crawl_start → OPEN_WORLD; crawl_cancel → DESTRUCTIVE; crawl_status → READ_ONLY - oc_reflect, oc_task_start, oc_task_cancel/get/list/wait classified by worst-case input - oc_skill_replay → OPEN_WORLD_DESTRUCTIVE (matches execute_plan envelope) - oc_run_start (MUTATES), oc_run_finish (DESTRUCTIVE), status/events (READ_ONLY) - Synthesized skill-replay tools inline OPEN_WORLD_DESTRUCTIVE since they may navigate to arbitrary origins under the replay engine. Restores green CI on all 9 OS/Node combos. * fix(945): add annotations for develop-era tools * fix(945): include run-harness dir in TOOL_ANNOTATIONS reverse test * fix(945): restore develop's recording.ts incl. oc_recording_status * fix(945): add annotations to recording tools incl. oc_recording_status * test(annotations): sync expected counts and dynamic-skill fixture for develop merges Two follow-ups after merging develop into this branch: 1. tests/unit/tool-annotations.test.ts now expects recording.ts to have 5 annotated tools (oc_recording_status landed on develop after this PR branched). Bump the spot-check expectation from 4 → 5. 2. tests/pilot/dynamic-skills/registry.test.ts builds a synthetic RegistryEntry whose `definition` is typed as MCPToolDefinition. After this PR makes `annotations` required on the type, the literal in makeEntry() must carry an annotations field too — add the same OPEN_WORLD_DESTRUCTIVE shape the synthesizer emits. * feat(annotations): annotate oc_task_run_* tools landed from develop PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR merge commit now drags src/tools/task-run.ts onto this branch and tsc fails with TS2741 on 7 tools missing the required annotations field. Add annotations to each of the 7 oc_task_run_* tool definitions and catalog them in TOOL_ANNOTATIONS: - start/update/checkpoint/needs_help → MUTATES - complete → DESTRUCTIVE (terminal) - get/list → READ_ONLY * test(pilot/curator): bump cross-process serialize test timeout for Windows CI Same fix applied to feat/983-best-first (#1065). The 'separate writer processes serialize read/merge/write updates' test exceeds the default 10s jest deadline on hosted windows-22 runners because ts-node startup × 8 children overruns the budget before the assertions can run. Bump the per-test timeout to 60s. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: OmX <omx@oh-my-codex.dev>

This comment was marked as outdated.

Sign in to view

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 mentioned this pull request May 13, 2026

feat(task-run): goal-level lifecycle and evidence state on top of task ledger #1039

Closed

25 tasks

shaun0927 and others added 3 commits May 13, 2026 18:04

shaun0927 force-pushed the feat/1039-task-run-lifecycle branch from eae252c to 4ba6bf6 Compare May 13, 2026 09:29

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Merge develop into feat/1039-task-run-lifecycle

74f85b0

# Conflicts: # src/tools/index.ts

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 merged commit e1282af into develop May 13, 2026
9 checks passed

		await fs.promises.mkdir(this.runDir(runId), { recursive: true });
		const release = await acquireLock(this.lockPath(runId));

Conversation

shaun0927 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress / Review status

Summary

Fit / non-duplication review

Validation

Notes

Uh oh!

gemini-code-assist Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shaun0927 commented May 12, 2026 •

edited

Loading