Skip to content

feat(task-run): persist goal-level lifecycle state (#1039)#1083

Merged
shaun0927 merged 4 commits into
developfrom
feat/1039-task-run-lifecycle
May 13, 2026
Merged

feat(task-run): persist goal-level lifecycle state (#1039)#1083
shaun0927 merged 4 commits into
developfrom
feat/1039-task-run-lifecycle

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/1039-task-run-lifecycledevelop
Draft no
CI ❌ 8/9 passing — 1 failing
Mergeable ❌ CONFLICTING
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) chatgpt-codex-connector: commented
Head 4ba6bf6 — fix(task-run): lock concurrent mutations, reject NEEDS_HELP via updat…
Commits 3

Owner comment cleanup: 3 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Summary

Closes #1039.

Adds an opt-in goal-level TaskRun lifecycle layer that stores user-goal progress outside browser/session state:

  • src/core/task-run/* file-backed TaskRun store under OPENCHROME_HOME/task-runs or ~/.openchrome/task-runs
  • seven MCP tools: oc_task_run_start, oc_task_run_update, oc_task_run_checkpoint, oc_task_run_needs_help, oc_task_run_complete, oc_task_run_get, oc_task_run_list
  • redacted metadata/event persistence, terminal-state immutability, NEEDS_HELP resume guard, bounded item arrays, and checkpoint files

Fit / non-duplication review

Validation

  • npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBand
  • npm run build
  • npm run lint:changed

Notes

Live MCP round-trip against a running Chrome daemon was not performed in this branch; the tool surface and persistence behavior are covered by unit/registration tests.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eae252c83b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/core/task-run/storage.ts Outdated
const eventsPath = this.eventsPath(runId);
if (!fs.existsSync(eventsPath)) return [];
const text = await fs.promises.readFile(eventsPath, 'utf8');
return text.split('\n').filter(Boolean).map(line => JSON.parse(line) as TaskRunEvent);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tolerate malformed event lines when reading TaskRun logs

Guard readEvents against malformed JSONL entries instead of failing the whole request. As written, a single partial/corrupt line in events.jsonl (for example after an interrupted write or manual file damage) makes JSON.parse throw and causes oc_task_run_get with include_events=true to return an error even when the TaskRun metadata is otherwise valid. This turns a non-critical log corruption into a hard read failure for operators.

Useful? React with 👍 / 👎.

shaun0927 and others added 3 commits May 13, 2026 18:04
Introduce an opt-in TaskRun store and MCP tool surface so long-running user goals can persist progress, help state, checkpoints, and evidence without changing existing browser tool behavior.

Constraint: Builds on issue #1039 while avoiding duplication of open task-ledger PR #911.

Rejected: Requiring Postgres or a Bytebot-style desktop task service | conflicts with OpenChrome's local MCP/CDP-first design.

Confidence: high

Scope-risk: moderate

Directive: Keep TaskRun as goal-level metadata; do not move browser execution or #855 async task scheduling into this layer.

Tested: npm test -- --runTestsByPath tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts --runInBand; npm run build; npm run lint:changed

Not-tested: Live MCP round-trip against a running Chrome daemon.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Constraint: CI enforces REQUIRED prefixes and pipes the full tools manifest through lint-tool-schemas.
Rejected: Updating the schema lint baseline | new TaskRun tools can satisfy the existing contract directly.
Confidence: high
Scope-risk: narrow
Directive: New required tool fields must include REQUIRED descriptions and introspection output must drain before exit.
Tested: npm test -- --runInBand tests/core/task-run/storage.test.ts tests/tools/task-run-tools.test.ts; npm run build; npm run lint:tool-schemas
Not-tested: Full GitHub CI matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
…e, accumulate truncation counts, tolerate malformed JSONL, fix schema and scope-policy

- P1: Wrap update/checkpoint/needsHelp/complete with per-run acquireLock so
  concurrent calls on the same run_id serialize rather than overwriting each
  other (read-modify-write race).
- P2: Reject status=NEEDS_HELP via update(); callers must use
  oc_task_run_needs_help which guarantees reason+requested_at payload.
- P2: Make completed_items_truncated / failed_items_truncated cumulative
  across multiple overflow calls instead of resetting to latest overflow only.
- P2: Remove PENDING from oc_task_run_update schema enum (runtime already
  rejects it; schema now matches behaviour).
- P2: Add oc_task_run_get + oc_task_run_list to READ_TOOLS in scope-policy so
  read-only API keys can query TaskRun state.
- P2: Tolerate malformed JSONL lines in readEvents(); skip and log instead of
  throwing for the entire call.
- Tests: concurrent serialization, NEEDS_HELP rejection, cumulative truncation,
  malformed JSONL tolerance, scope-policy read classification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shaun0927 shaun0927 force-pushed the feat/1039-task-run-lifecycle branch from eae252c to 4ba6bf6 Compare May 13, 2026 09:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ba6bf6898

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/task-run.ts
Comment on lines +240 to +242
const listHandler: ToolHandler = async (_sessionId, args) => {
try {
const task_runs = await store.list(args as TaskRunListFilter);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope TaskRun listing to caller-owned data

oc_task_run_list ignores the caller session (_sessionId) and directly returns store.list(...), which scans the shared on-disk task-run directory. In API-key multi-tenant deployments, mcp-server only enforces tenant ownership for session IDs, not arbitrary task-run IDs, so a read-scoped tenant can enumerate other tenants’ run metadata (goal/progress/run_id) from the same daemon process.

Useful? React with 👍 / 👎.

Comment thread src/tools/task-run.ts
Comment on lines +185 to +187
const runId = String(args.run_id || '');
const meta = await store.update(runId, args as UpdateTaskRunInput);
return jsonResult({ task_run: meta });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce ownership before TaskRun mutation

Mutating handlers update state using only run_id and never check whether the run belongs to the calling session/tenant. In a shared server, any write-capable tenant that learns a foreign run_id (for example via the global list endpoint) can change another tenant’s TaskRun status/progress, causing cross-tenant state corruption.

Useful? React with 👍 / 👎.

# Conflicts:
#	src/tools/index.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74f85b01b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

...current,
status: nextStatus,
workflow_id: optionalString(input.workflow_id) || current.workflow_id,
ledger_task_ids: uniqueStrings([...(current.ledger_task_ids || []), ...(input.ledger_task_ids || [])]),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize ledger_task_ids before merging updates

TaskRunStore.update() builds ledger_task_ids with ...(input.ledger_task_ids || []), but MCP argument validation only enforces required fields (no type check), so a caller can pass a non-array value. A string value is iterable and gets split into single characters ("abc" -> ["a","b","c"]), silently corrupting stored linkage metadata; non-iterables throw and fail the call. Please sanitize input.ledger_task_ids via uniqueStrings directly before spreading/merging.

Useful? React with 👍 / 👎.

Comment on lines +315 to +316
await fs.promises.mkdir(this.runDir(runId), { recursive: true });
const release = await acquireLock(this.lockPath(runId));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent orphan lock dirs on unknown run IDs

withRunLock() creates runDir and meta.lock before verifying the TaskRun exists, while callers can supply arbitrary 16-hex run_ids. Repeated typo/malicious update/checkpoint/needs_help/complete calls therefore create permanent empty directories for nonexistent runs, causing unbounded filesystem growth under task-runs and extra work on future list() scans. Existence should be checked (or cleanup performed) before persisting lock artifacts for unknown IDs.

Useful? React with 👍 / 👎.

@shaun0927 shaun0927 merged commit e1282af into develop May 13, 2026
9 checks passed
shaun0927 added a commit that referenced this pull request May 13, 2026
PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR
merge commit now drags src/tools/task-run.ts onto this branch and
tsc fails with TS2741 on 7 tools missing the required annotations
field.

Add annotations to each of the 7 oc_task_run_* tool definitions and
catalog them in TOOL_ANNOTATIONS:
- start/update/checkpoint/needs_help → MUTATES
- complete                          → DESTRUCTIVE (terminal)
- get/list                          → READ_ONLY
shaun0927 added a commit that referenced this pull request May 13, 2026
* feat(core): ToolAnnotations on every tool (#867)

Adds the MCP-spec ToolAnnotations field (readOnlyHint, destructiveHint,
idempotentHint, openWorldHint) to every tool exposed by the MCP server.
Pure metadata — zero behavior change, no new dependencies, no native
modules, no outbound traffic. Compliant with Portability-Harness
Contract P1/P2/P3/P4/P5.

Approach (hybrid):
- src/types/tool-annotations.ts (NEW) — canonical TOOL_ANNOTATIONS
  table, single source of truth.
- MCPToolDefinition.annotations is a REQUIRED field — TypeScript
  enforces presence at compile time for every future tool.
- Each tool file declares `annotations: TOOL_ANNOTATIONS.<name>` (one
  import + one reference line) rather than duplicating literal values.
  70 definitions across 58 tool files + 2 pilot/handoff tools + the
  inline expand_tools virtual tool.

Worst-case rule (consistent across the table): annotations describe the
most destructive / least pure behavior across all valid inputs.
javascript_tool, batch_execute, act, network, and request_intercept
are all flagged destructive + open-world because their worst-case
inputs (arbitrary JS, batch dispatch, NL action routing, network
emulation/blocking) can mutate or block observable state.

Verification:
- npm run build, npm run lint, npm test pass.
- tests/unit/tool-annotations.test.ts (NEW, 9 cases) covers shape,
  destructive/read-only/open-world expectations, requireAnnotations,
  source-level injection check, and reverse-direction orphan check.
- Only pre-existing flaky stress timing test fails under parallel
  load; passes in isolation — unrelated to this change.

docs/mcp/tool-annotations.md mirrors the runtime table for human
review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(annotations): reclassify validate_page, batch_paginate, console_capture (codex P1/P2 from #945)

Codex flagged three tools incorrectly mapped to READ_ONLY:
- validate_page navigates via smartGoto, can create a tab, waits on live
  page state — open-world, mutating, non-idempotent.
- batch_paginate presses keys, clicks, scrolls; in 'url' mode creates tabs
  and navigates to generated URLs — mutating, non-idempotent, potentially
  open-world.
- console_capture supports start/stop/clear; clear deletes buffered logs —
  destructive, non-idempotent.

Updates docs/mcp/tool-annotations.md mirror to match.

Addresses Codex review feedback on PR #945.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Allow manifest entries to carry annotations

Update the worker tool manifest type so tests and generated manifests can include MCP ToolAnnotations added by the tool-annotation PR.

Constraint: Preserve existing manifest filtering behavior and make annotations optional for older manifests.

Rejected: Removing annotations from manifest tests | workers should be able to receive annotation metadata when present.

Confidence: high

Scope-risk: narrow

Directive: Keep tool manifest typing aligned with MCPToolDefinition when new shared fields are surfaced.

Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts

Not-tested: Full GitHub Actions matrix before push

* Annotate develop-era tools for MCP safety hints

Constraint: PR #945 now requires every MCP tool definition, including tools merged after the original branch, to carry canonical annotations.\nRejected: Making annotations optional for newer tools | that would weaken the issue #867 contract and hide future omissions.\nConfidence: high\nScope-risk: narrow\nDirective: Add TOOL_ANNOTATIONS and docs rows whenever a new tool definition is registered.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts\nNot-tested: full CI matrix

* Align Cursor tier-one count with current tool surface

Constraint: CI runs the Cursor cross-env suite with OPENCHROME_RUN_CROSS_ENV=1, where develop now exposes 46 non-expand tier-one tools.\nRejected: Reverting new tier-one registrations | those came from current develop and are outside PR #945's scope.\nConfidence: high\nScope-risk: narrow\nDirective: Update this count when intentional tier-one tool exposure changes.\nTested: npx jest --config jest.config.js --runInBand tests/cross-env/cursor-verification.test.ts (skipped locally without cross-env flag)\nNot-tested: full CI matrix

* Stabilize CI baselines for current tool surface

Constraint: This branch inherits develop's 45-tool tier-one surface and Windows CI checks out fixtures with CRLF.\nRejected: Reverting develop tool registrations | outside PR #945 scope and would reintroduce merge drift.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Cursor tier-one counts branch-specific and normalize fixture newlines for generated JSON.\nTested: npx jest --config jest.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts\nNot-tested: full CI matrix

* Harden admin key stdout assertion against worker noise

Constraint: Windows CI can interleave unrelated Jest worker output into the in-process stdout hook.\nRejected: Requiring stdout to contain only one physical line | that assertion flakes when shared process output is polluted outside the CLI path.\nConfidence: high\nScope-risk: narrow\nDirective: Assert the secret token appears exactly once and never in stderr; do not depend on unrelated captured stdout being empty.\nTested: npx jest --config jest.config.js --runInBand tests/cli/admin-keys.test.ts\nNot-tested: full CI matrix

* fix(annotations): reclassify execute_plan, workflow_init, memory

Codex P1/P2 findings on PR #945:
- execute_plan: open-world AND destructive (resolves arbitrary tool names;
  worst-case dispatch set includes destructive tools + non-loopback nav)
- workflow_init: open-world (dnsResolve + worker page.goto to non-loopback)
- memory: destructive (validate-prune path deletes memory entries)

Mirrors the docs table in docs/mcp/tool-annotations.md.

* feat(annotations): annotate develop-era tools merged after this PR

PR #945 enforces ToolAnnotations on every MCPToolDefinition by typing
`annotations` as required, but the tool surface on develop has grown by
15 entries since the PR was branched — crawl_*, oc_reflect, oc_task_*,
oc_skill_replay, oc_run_*, and the dynamic-skill synthesizer. The tsc
build accordingly fails on all 9 OS/Node combos with TS2741.

Add catalog entries to TOOL_ANNOTATIONS and wire each new tool's
definition to its annotation:
- crawl_start → OPEN_WORLD; crawl_cancel → DESTRUCTIVE; crawl_status → READ_ONLY
- oc_reflect, oc_task_start, oc_task_cancel/get/list/wait classified by worst-case input
- oc_skill_replay → OPEN_WORLD_DESTRUCTIVE (matches execute_plan envelope)
- oc_run_start (MUTATES), oc_run_finish (DESTRUCTIVE), status/events (READ_ONLY)
- Synthesized skill-replay tools inline OPEN_WORLD_DESTRUCTIVE since they
  may navigate to arbitrary origins under the replay engine.

Restores green CI on all 9 OS/Node combos.

* fix(945): add annotations for develop-era tools

* fix(945): include run-harness dir in TOOL_ANNOTATIONS reverse test

* fix(945): restore develop's recording.ts incl. oc_recording_status

* fix(945): add annotations to recording tools incl. oc_recording_status

* test(annotations): sync expected counts and dynamic-skill fixture for develop merges

Two follow-ups after merging develop into this branch:

1. tests/unit/tool-annotations.test.ts now expects recording.ts to have
   5 annotated tools (oc_recording_status landed on develop after this
   PR branched). Bump the spot-check expectation from 4 → 5.

2. tests/pilot/dynamic-skills/registry.test.ts builds a synthetic
   RegistryEntry whose `definition` is typed as MCPToolDefinition. After
   this PR makes `annotations` required on the type, the literal in
   makeEntry() must carry an annotations field too — add the same
   OPEN_WORLD_DESTRUCTIVE shape the synthesizer emits.

* feat(annotations): annotate oc_task_run_* tools landed from develop

PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR
merge commit now drags src/tools/task-run.ts onto this branch and
tsc fails with TS2741 on 7 tools missing the required annotations
field.

Add annotations to each of the 7 oc_task_run_* tool definitions and
catalog them in TOOL_ANNOTATIONS:
- start/update/checkpoint/needs_help → MUTATES
- complete                          → DESTRUCTIVE (terminal)
- get/list                          → READ_ONLY

* test(pilot/curator): bump cross-process serialize test timeout for Windows CI

Same fix applied to feat/983-best-first (#1065). The
'separate writer processes serialize read/merge/write updates' test
exceeds the default 10s jest deadline on hosted windows-22 runners
because ts-node startup × 8 children overruns the budget before the
assertions can run. Bump the per-test timeout to 60s.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: OmX <omx@oh-my-codex.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant