feat(task-run): persist goal-level lifecycle state (#1039)#1083
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eae252c83b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const eventsPath = this.eventsPath(runId); | ||
| if (!fs.existsSync(eventsPath)) return []; | ||
| const text = await fs.promises.readFile(eventsPath, 'utf8'); | ||
| return text.split('\n').filter(Boolean).map(line => JSON.parse(line) as TaskRunEvent); |
There was a problem hiding this comment.
Tolerate malformed event lines when reading TaskRun logs
Guard readEvents against malformed JSONL entries instead of failing the whole request. As written, a single partial/corrupt line in events.jsonl (for example after an interrupted write or manual file damage) makes JSON.parse throw and causes oc_task_run_get with include_events=true to return an error even when the TaskRun metadata is otherwise valid. This turns a non-critical log corruption into a hard read failure for operators.
Useful? React with 👍 / 👎.
Introduce an opt-in TaskRun store and MCP tool surface so long-running user goals can persist progress, help state, checkpoints, and evidence without changing existing browser tool behavior. Constraint: Builds on issue #1039 while avoiding duplication of open task-ledger PR #911. Rejected: Requiring Postgres or a Bytebot-style desktop task service | conflicts with OpenChrome's local MCP/CDP-first design. Confidence: high Scope-risk: moderate Directive: Keep TaskRun as goal-level metadata; do not move browser execution or #855 async task scheduling into this layer. Tested: npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBand; npm run build; npm run lint:changed Not-tested: Live MCP round-trip against a running Chrome daemon. Co-authored-by: OmX <omx@oh-my-codex.dev>
Constraint: CI enforces REQUIRED prefixes and pipes the full tools manifest through lint-tool-schemas. Rejected: Updating the schema lint baseline | new TaskRun tools can satisfy the existing contract directly. Confidence: high Scope-risk: narrow Directive: New required tool fields must include REQUIRED descriptions and introspection output must drain before exit. Tested: npm test -- --runInBand tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts; npm run build; npm run lint:tool-schemas Not-tested: Full GitHub CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
…e, accumulate truncation counts, tolerate malformed JSONL, fix schema and scope-policy - P1: Wrap update/checkpoint/needsHelp/complete with per-run acquireLock so concurrent calls on the same run_id serialize rather than overwriting each other (read-modify-write race). - P2: Reject status=NEEDS_HELP via update(); callers must use oc_task_run_needs_help which guarantees reason+requested_at payload. - P2: Make completed_items_truncated / failed_items_truncated cumulative across multiple overflow calls instead of resetting to latest overflow only. - P2: Remove PENDING from oc_task_run_update schema enum (runtime already rejects it; schema now matches behaviour). - P2: Add oc_task_run_get + oc_task_run_list to READ_TOOLS in scope-policy so read-only API keys can query TaskRun state. - P2: Tolerate malformed JSONL lines in readEvents(); skip and log instead of throwing for the entire call. - Tests: concurrent serialization, NEEDS_HELP rejection, cumulative truncation, malformed JSONL tolerance, scope-policy read classification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eae252c to
4ba6bf6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ba6bf6898
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const listHandler: ToolHandler = async (_sessionId, args) => { | ||
| try { | ||
| const task_runs = await store.list(args as TaskRunListFilter); |
There was a problem hiding this comment.
Scope TaskRun listing to caller-owned data
oc_task_run_list ignores the caller session (_sessionId) and directly returns store.list(...), which scans the shared on-disk task-run directory. In API-key multi-tenant deployments, mcp-server only enforces tenant ownership for session IDs, not arbitrary task-run IDs, so a read-scoped tenant can enumerate other tenants’ run metadata (goal/progress/run_id) from the same daemon process.
Useful? React with 👍 / 👎.
| const runId = String(args.run_id || ''); | ||
| const meta = await store.update(runId, args as UpdateTaskRunInput); | ||
| return jsonResult({ task_run: meta }); |
There was a problem hiding this comment.
Enforce ownership before TaskRun mutation
Mutating handlers update state using only run_id and never check whether the run belongs to the calling session/tenant. In a shared server, any write-capable tenant that learns a foreign run_id (for example via the global list endpoint) can change another tenant’s TaskRun status/progress, causing cross-tenant state corruption.
Useful? React with 👍 / 👎.
# Conflicts: # src/tools/index.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74f85b01b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ...current, | ||
| status: nextStatus, | ||
| workflow_id: optionalString(input.workflow_id) || current.workflow_id, | ||
| ledger_task_ids: uniqueStrings([...(current.ledger_task_ids || []), ...(input.ledger_task_ids || [])]), |
There was a problem hiding this comment.
Normalize ledger_task_ids before merging updates
TaskRunStore.update() builds ledger_task_ids with ...(input.ledger_task_ids || []), but MCP argument validation only enforces required fields (no type check), so a caller can pass a non-array value. A string value is iterable and gets split into single characters ("abc" -> ["a","b","c"]), silently corrupting stored linkage metadata; non-iterables throw and fail the call. Please sanitize input.ledger_task_ids via uniqueStrings directly before spreading/merging.
Useful? React with 👍 / 👎.
| await fs.promises.mkdir(this.runDir(runId), { recursive: true }); | ||
| const release = await acquireLock(this.lockPath(runId)); |
There was a problem hiding this comment.
Prevent orphan lock dirs on unknown run IDs
withRunLock() creates runDir and meta.lock before verifying the TaskRun exists, while callers can supply arbitrary 16-hex run_ids. Repeated typo/malicious update/checkpoint/needs_help/complete calls therefore create permanent empty directories for nonexistent runs, causing unbounded filesystem growth under task-runs and extra work on future list() scans. Existence should be checked (or cleanup performed) before persisting lock artifacts for unknown IDs.
Useful? React with 👍 / 👎.
PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR merge commit now drags src/tools/task-run.ts onto this branch and tsc fails with TS2741 on 7 tools missing the required annotations field. Add annotations to each of the 7 oc_task_run_* tool definitions and catalog them in TOOL_ANNOTATIONS: - start/update/checkpoint/needs_help → MUTATES - complete → DESTRUCTIVE (terminal) - get/list → READ_ONLY
* feat(core): ToolAnnotations on every tool (#867) Adds the MCP-spec ToolAnnotations field (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) to every tool exposed by the MCP server. Pure metadata — zero behavior change, no new dependencies, no native modules, no outbound traffic. Compliant with Portability-Harness Contract P1/P2/P3/P4/P5. Approach (hybrid): - src/types/tool-annotations.ts (NEW) — canonical TOOL_ANNOTATIONS table, single source of truth. - MCPToolDefinition.annotations is a REQUIRED field — TypeScript enforces presence at compile time for every future tool. - Each tool file declares `annotations: TOOL_ANNOTATIONS.<name>` (one import + one reference line) rather than duplicating literal values. 70 definitions across 58 tool files + 2 pilot/handoff tools + the inline expand_tools virtual tool. Worst-case rule (consistent across the table): annotations describe the most destructive / least pure behavior across all valid inputs. javascript_tool, batch_execute, act, network, and request_intercept are all flagged destructive + open-world because their worst-case inputs (arbitrary JS, batch dispatch, NL action routing, network emulation/blocking) can mutate or block observable state. Verification: - npm run build, npm run lint, npm test pass. - tests/unit/tool-annotations.test.ts (NEW, 9 cases) covers shape, destructive/read-only/open-world expectations, requireAnnotations, source-level injection check, and reverse-direction orphan check. - Only pre-existing flaky stress timing test fails under parallel load; passes in isolation — unrelated to this change. docs/mcp/tool-annotations.md mirrors the runtime table for human review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(annotations): reclassify validate_page, batch_paginate, console_capture (codex P1/P2 from #945) Codex flagged three tools incorrectly mapped to READ_ONLY: - validate_page navigates via smartGoto, can create a tab, waits on live page state — open-world, mutating, non-idempotent. - batch_paginate presses keys, clicks, scrolls; in 'url' mode creates tabs and navigates to generated URLs — mutating, non-idempotent, potentially open-world. - console_capture supports start/stop/clear; clear deletes buffered logs — destructive, non-idempotent. Updates docs/mcp/tool-annotations.md mirror to match. Addresses Codex review feedback on PR #945. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Allow manifest entries to carry annotations Update the worker tool manifest type so tests and generated manifests can include MCP ToolAnnotations added by the tool-annotation PR. Constraint: Preserve existing manifest filtering behavior and make annotations optional for older manifests. Rejected: Removing annotations from manifest tests | workers should be able to receive annotation metadata when present. Confidence: high Scope-risk: narrow Directive: Keep tool manifest typing aligned with MCPToolDefinition when new shared fields are surfaced. Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts Not-tested: Full GitHub Actions matrix before push * Annotate develop-era tools for MCP safety hints Constraint: PR #945 now requires every MCP tool definition, including tools merged after the original branch, to carry canonical annotations.\nRejected: Making annotations optional for newer tools | that would weaken the issue #867 contract and hide future omissions.\nConfidence: high\nScope-risk: narrow\nDirective: Add TOOL_ANNOTATIONS and docs rows whenever a new tool definition is registered.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts\nNot-tested: full CI matrix * Align Cursor tier-one count with current tool surface Constraint: CI runs the Cursor cross-env suite with OPENCHROME_RUN_CROSS_ENV=1, where develop now exposes 46 non-expand tier-one tools.\nRejected: Reverting new tier-one registrations | those came from current develop and are outside PR #945's scope.\nConfidence: high\nScope-risk: narrow\nDirective: Update this count when intentional tier-one tool exposure changes.\nTested: npx jest --config jest.config.js --runInBand tests/cross-env/cursor-verification.test.ts (skipped locally without cross-env flag)\nNot-tested: full CI matrix * Stabilize CI baselines for current tool surface Constraint: This branch inherits develop's 45-tool tier-one surface and Windows CI checks out fixtures with CRLF.\nRejected: Reverting develop tool registrations | outside PR #945 scope and would reintroduce merge drift.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Cursor tier-one counts branch-specific and normalize fixture newlines for generated JSON.\nTested: npx jest --config jest.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts\nNot-tested: full CI matrix * Harden admin key stdout assertion against worker noise Constraint: Windows CI can interleave unrelated Jest worker output into the in-process stdout hook.\nRejected: Requiring stdout to contain only one physical line | that assertion flakes when shared process output is polluted outside the CLI path.\nConfidence: high\nScope-risk: narrow\nDirective: Assert the secret token appears exactly once and never in stderr; do not depend on unrelated captured stdout being empty.\nTested: npx jest --config jest.config.js --runInBand tests/cli/admin-keys.test.ts\nNot-tested: full CI matrix * fix(annotations): reclassify execute_plan, workflow_init, memory Codex P1/P2 findings on PR #945: - execute_plan: open-world AND destructive (resolves arbitrary tool names; worst-case dispatch set includes destructive tools + non-loopback nav) - workflow_init: open-world (dnsResolve + worker page.goto to non-loopback) - memory: destructive (validate-prune path deletes memory entries) Mirrors the docs table in docs/mcp/tool-annotations.md. * feat(annotations): annotate develop-era tools merged after this PR PR #945 enforces ToolAnnotations on every MCPToolDefinition by typing `annotations` as required, but the tool surface on develop has grown by 15 entries since the PR was branched — crawl_*, oc_reflect, oc_task_*, oc_skill_replay, oc_run_*, and the dynamic-skill synthesizer. The tsc build accordingly fails on all 9 OS/Node combos with TS2741. Add catalog entries to TOOL_ANNOTATIONS and wire each new tool's definition to its annotation: - crawl_start → OPEN_WORLD; crawl_cancel → DESTRUCTIVE; crawl_status → READ_ONLY - oc_reflect, oc_task_start, oc_task_cancel/get/list/wait classified by worst-case input - oc_skill_replay → OPEN_WORLD_DESTRUCTIVE (matches execute_plan envelope) - oc_run_start (MUTATES), oc_run_finish (DESTRUCTIVE), status/events (READ_ONLY) - Synthesized skill-replay tools inline OPEN_WORLD_DESTRUCTIVE since they may navigate to arbitrary origins under the replay engine. Restores green CI on all 9 OS/Node combos. * fix(945): add annotations for develop-era tools * fix(945): include run-harness dir in TOOL_ANNOTATIONS reverse test * fix(945): restore develop's recording.ts incl. oc_recording_status * fix(945): add annotations to recording tools incl. oc_recording_status * test(annotations): sync expected counts and dynamic-skill fixture for develop merges Two follow-ups after merging develop into this branch: 1. tests/unit/tool-annotations.test.ts now expects recording.ts to have 5 annotated tools (oc_recording_status landed on develop after this PR branched). Bump the spot-check expectation from 4 → 5. 2. tests/pilot/dynamic-skills/registry.test.ts builds a synthetic RegistryEntry whose `definition` is typed as MCPToolDefinition. After this PR makes `annotations` required on the type, the literal in makeEntry() must carry an annotations field too — add the same OPEN_WORLD_DESTRUCTIVE shape the synthesizer emits. * feat(annotations): annotate oc_task_run_* tools landed from develop PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR merge commit now drags src/tools/task-run.ts onto this branch and tsc fails with TS2741 on 7 tools missing the required annotations field. Add annotations to each of the 7 oc_task_run_* tool definitions and catalog them in TOOL_ANNOTATIONS: - start/update/checkpoint/needs_help → MUTATES - complete → DESTRUCTIVE (terminal) - get/list → READ_ONLY * test(pilot/curator): bump cross-process serialize test timeout for Windows CI Same fix applied to feat/983-best-first (#1065). The 'separate writer processes serialize read/merge/write updates' test exceeds the default 10s jest deadline on hosted windows-22 runners because ts-node startup × 8 children overruns the budget before the assertions can run. Bump the per-test timeout to 60s. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: OmX <omx@oh-my-codex.dev>
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
feat/1039-task-run-lifecycle→develop4ba6bf6— fix(task-run): lock concurrent mutations, reject NEEDS_HELP via updat…Owner comment cleanup: 3 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary
Closes #1039.
Adds an opt-in goal-level TaskRun lifecycle layer that stores user-goal progress outside browser/session state:
src/core/task-run/*file-backed TaskRun store underOPENCHROME_HOME/task-runsor~/.openchrome/task-runsoc_task_run_start,oc_task_run_update,oc_task_run_checkpoint,oc_task_run_needs_help,oc_task_run_complete,oc_task_run_get,oc_task_run_listFit / non-duplication review
Validation
npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBandnpm run buildnpm run lint:changedNotes
Live MCP round-trip against a running Chrome daemon was not performed in this branch; the tool surface and persistence behavior are covered by unit/registration tests.