Skip to content

feat(core): oc_task_ledger — persistent async task table with cancel & wait (mcp-browser-use adoption A) #855

@shaun0927

Description

@shaun0927

Why

OpenChrome currently exposes several long-running tools — crawl (src/tools/crawl.ts registered at src/tools/index.ts:85), crawl_sitemap (src/tools/crawl-sitemap.ts, registered line 86), recording (src/tools/recording.ts, registered line 193), oc_evidence_bundle (registered line 215), and oc_session_snapshot (registered line 179) — but every handler is synchronous-returning. The MCP client must hold its single in-flight request open for the entire duration; if the host LLM times out, drops the connection, or simply wants to interleave another tool call, there is no recovery — the work is lost and there is no record of partial progress.

mcp-browser-use solves this by treating long-running operations as background tasks with a persistent ledger and explicit start / list / get / cancel / wait verbs. The host LLM fires-and-forgets, polls when convenient, and can cancel without tearing down the MCP session. This pattern also unlocks #D (progress notifications) and #F (dashboard task view).

OpenChrome already has the right primitives to implement this without new native deps: the JSONL+meta.json+proper-lockfile pattern in src/core/trace/storage.ts (lines 1-493) is directly reusable, and src/utils/atomic-file.ts provides acquireLock / writeFileAtomicSafe. The oc_reap_orphans PID-lock pattern (recently extended in commit 334fa4d) already handles crash-resilient reaping.

What

Add a persistent task ledger and five new MCP tools — oc_task_start, oc_task_list, oc_task_get, oc_task_cancel, oc_task_wait — that wrap the five long-running tools listed above so they may run as background jobs.

Boundary:

  • src/core/task-ledger/{storage.ts,types.ts,runner.ts,index.ts} (new directory). Storage layer mirrors src/core/trace/storage.ts line-for-line on disk semantics.
  • src/tools/oc-task-{start,list,get,cancel,wait}.ts (new) registered in src/tools/index.ts.
  • Modify the five existing tool handlers to expose a thin "wrappable inner runner" so oc_task_start can invoke them without code duplication. No behavioural change when the tools are called directly.
  • Default ledger root: ~/.openchrome/tasks/ (sibling to ~/.openchrome/traces/).
  • Core tier (P1 host-pluggable, P2 OS-portable, P5 no native deps).

Contract

// src/core/task-ledger/types.ts
export type TaskStatus = 'PENDING' | 'RUNNING' | 'COMPLETED' | 'FAILED' | 'CANCELLED';
export type TaskKind = 'crawl' | 'crawl_sitemap' | 'recording' | 'oc_evidence_bundle' | 'oc_session_snapshot';

export interface TaskMeta {
  task_id: string;            // 16-hex SHA-256(kind\x00args_json\x00created_at)
  kind: TaskKind;
  status: TaskStatus;
  pid: number;                // owner pid; reaper checks aliveness
  created_at: number;
  started_at?: number;
  ended_at?: number;
  args_summary: Record<string, unknown>;   // redacted, ≤2 KiB
  result_path?: string;        // "<root>/<task_id>/result.json" iff COMPLETED
  error?: { message: string; code?: string };
  cancel_requested_at?: number;
}

export interface TaskEvent {
  ts: number;
  kind: 'started' | 'progress' | 'log' | 'completed' | 'failed' | 'cancelled' | 'cancel_requested';
  data?: Record<string, unknown>;   // for progress: { unitsDone, unitsTotal, message? }
}

Invariants:

  1. State machine: PENDING → RUNNING → (COMPLETED | FAILED | CANCELLED). Terminal states are immutable.
  2. On startup, any task whose meta.json says RUNNING and whose pid is no longer alive (per process.kill(pid, 0)) is reaped to FAILED with error.code = "orphaned". This must run before any new task is accepted.
  3. Disk layout: <root>/<task_id>/{meta.json, events.jsonl, result.json?, lock}. meta.json is updated via writeFileAtomicSafe; events are appended; the lock is held only during meta.json mutation, never for the long body of work.
  4. oc_task_cancel is best-effort: it sets cancel_requested_at in meta.json and emits a cancel_requested event; the runner cooperates by checking meta.json between work units. Cancellation latency ≤ one work unit (≤ one crawled page for crawl).
  5. oc_task_wait(task_id, timeout_ms?) returns the final TaskMeta once status is terminal, or throws a typed timeout error. Default timeout_ms = 60_000. Wait does NOT block other tool calls (uses fs.watch + bounded poll, not a CPU spin).
  6. Calling crawl etc. directly still works and returns synchronously; oc_task_start({ kind: "crawl", args }) is the opt-in path.
  7. Result payload is the same shape as the synchronous tool's response. oc_task_get(task_id, { include_result: true }) resolves the file; without that flag, only meta.json is returned (bounds response size).
  8. oc_task_list supports { status?, kind?, limit?, since? }; default limit = 50, default sort by created_at descending.

Acceptance criteria

  • All five new tools registered in src/tools/index.ts and surfaced in the tool catalog.
  • Five long-running tools refactored to share an inner runner with oc_task_start; existing direct-call behaviour byte-identical (regression fixture in tests/tools/crawl.parity.test.ts).
  • Crash-resilience test in tests/core/task-ledger/orphan-reap.test.ts: spawn child process that creates a RUNNING task and process.exit(1)s mid-run; on next ledger open, task reaps to FAILED with error.code = "orphaned".
  • Cancel test in tests/core/task-ledger/cancel.test.ts: oc_task_start({kind: "crawl", args: {url: "file://<fixture>", maxPages: 5}}), then oc_task_cancel after ≥1 page completes; final status CANCELLED, result.json contains the partial pages already crawled (no data loss).
  • oc_task_wait returns within 200 ms of terminal transition (timing test on a fast fixture).
  • Lock file is released even when the worker throws — verified by a fault-injection test (failOnPage = 2 in crawl args).
  • Default ledger root resolves via os.homedir() (Windows-compatible per CLAUDE.md).
  • npm run build && npm test green.

Real verification (post-merge, via openchrome MCP)

  1. mcp__openchrome__navigate to https://example.com to confirm the daemon is healthy.
  2. mcp__openchrome__oc_task_start with { kind: "crawl", args: { url: "https://example.com", maxPages: 5, sameOrigin: true } } → assert response shape { task_id: <16-hex>, status: "PENDING" | "RUNNING" }.
  3. mcp__openchrome__oc_task_list → assert the new task_id appears with status in {PENDING, RUNNING} and kind = "crawl".
  4. After ≥1 page completes (poll with oc_task_get every 250 ms), call mcp__openchrome__oc_task_cancel({ task_id }) → assert returned status CANCELLED within ≤2 s.
  5. mcp__openchrome__oc_task_get({ task_id, include_result: true }) → assert result.json contains a non-empty pages[] array (partial output retained).
  6. Restart the openchrome daemon (mcp__openchrome__oc_stop then re-launch). Re-issue oc_task_list → previous CANCELLED task is still listed (persistence across restart).
  7. Negative path: launch a second crawl task, then kill -9 the daemon mid-run, restart, and assert that task's status is FAILED with error.code = "orphaned" (orphan reaper).
  8. mcp__openchrome__oc_task_wait({ task_id: <a fast fixture task id>, timeout_ms: 30000 }) → returns terminal meta within bound.

A reproducer script lives at scripts/verify/oc-task-ledger.mjs.

Out of scope

  • Cross-process priority queueing or fair scheduling.
  • Resuming RUNNING → RUNNING after restart (orphans always reap to FAILED; resume is a separate concern).
  • Web-UI / dashboard wiring (covered by issue F).
  • MCP notifications/progress emission (covered by issue D).

Dependencies

  • None blocking. Soft dep on oc_reap_orphans patterns (already merged in 334fa4d).
  • Hard prerequisite for issues D and F.

Effort

L (~6+ dev days). New storage subsystem, five new tools, refactor of five existing handlers, crash-resilience tests.

References

  • mcp-browser-use README (https://github.com/Saik0s/mcp-browser-use) — task-ledger pattern.
  • Internal comparison analysis (chat thread, 2026-05-12).
  • src/core/trace/storage.ts (reusable JSONL+meta.json+lock pattern).
  • src/utils/atomic-file.ts (acquireLock, writeFileAtomicSafe).
  • Recent reap-orphans PID-lock work (commit 334fa4d).

Revision history

  • 2026-05-12 r1: Original draft.
  • 2026-05-12 r2: Critic-driven revision. Tightened cancel semantics to "≤ one work unit" with concrete fixture. Added explicit byte-parity regression test for direct-call path. Required oc_task_wait to use fs.watch+bounded poll (not CPU spin). Constrained args_summary to ≤2 KiB redacted. Removed earlier ambiguous "tasks SHOULD survive restart" wording; replaced with explicit "persisted; orphans reap to FAILED" invariant.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: persistent async task ledger.
  • Primary deliverable: oc_task_ledger persistent async task table with start/list/status/wait/cancel semantics for long-running tools.
  • Open PR: feat(core): oc_task_ledger — persistent async task table with cancel & wait (#855) #911 (feat/855-task-ledger, merged) — verify merged implementation still matches this issue before closing. Continue there; avoid duplicate work.
  • Non-goal: agent planning, replacing long-running tool logic, or converting every tool in one PR without staged coverage.

Overlap and conflict resolution

Implementation checklist

  • Verify merged PR includes durable task records with IDs, status, progress, result/error, created/updated times, and cancellation where scoped.
  • Ensure wait/list/status surfaces are documented and restart-safe as intended.
  • Add/confirm tests for completion, failure, cancellation, wait timeout, restart persistence, and tool integration coverage.
  • Document which tools are ledger-enabled in v1 and which remain synchronous.
  • Ensure redaction and bounded result storage for ledger outputs.

Success criteria

  • Long-running work can be observed/recovered after host timeout or reconnect.
  • Cancellation/wait semantics are deterministic and tested.
  • Ledger state survives restart where promised.
  • Non-ledger tools remain backward-compatible.

Post-merge OpenChrome live verification checklist

  • Run a ledger-enabled long-running local fixture, list/status/wait it, and verify final result.
  • Cancel a running fixture and verify status/result.
  • Restart if supported and verify task record remains accessible.
  • Record task IDs, status transitions, and artifact paths in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestobservabilityObservabilityreliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions