Skip to content

feat(observability): trajectory bundle unifying journal, recording, checkpoints, and contracts #1059

@shaun0927

Description

@shaun0927

Tier: core observability, additive only
PR target: develop
Source analysis: CUA-style replayable trajectories, adapted to OpenChrome's existing journal / recording / checkpoint / contract evidence.

Why

OpenChrome currently records useful long-run facts in several places:

  • task journal JSONL records every tool call,
  • recording stores action timelines and screenshots,
  • checkpoint stores task progress and tab state,
  • Outcome Contracts produce assertion evidence,
  • hints/progress logic detects stalls and recovery suggestions.

Those artifacts are valuable but fragmented. A long-running task failure should produce one portable episode trajectory bundle that lets a maintainer answer:

  1. What did the agent try?
  2. What browser state changed after each step?
  3. Where did progress stop?
  4. Which contract failed?
  5. Which checkpoint can be used to resume or reproduce?

CUA's transferable idea is its event/image trajectory format. OpenChrome should adopt a deterministic, file-based bundle that unifies existing artifacts without changing tool behavior.

Directionality / fit check

This issue is observability-only:

  • It records facts; it does not make LLM decisions.
  • It does not introduce server-side agent loops.
  • It does not require external services or credentials.
  • It must degrade to no-op if storage is unavailable.
  • Existing tools and recording behavior remain compatible.

Proposed implementation

Add an optional trajectory bundle writer under src/trajectory/ and expose it through the existing recording/journal/checkpoint paths.

Bundle layout

Default root:

~/.openchrome/trajectories/<trajectory_id>/
  meta.json
  events.jsonl
  screenshots/
    000001-before.png
    000001-after.png
  checkpoints/
    000010.json
  contracts/
    000012.json
  report.json

trajectory_id format: traj-YYYYMMDD-HHMMSS-<6hex>.

Event schema

Each event is one JSONL row:

export interface TrajectoryEvent {
  version: 1;
  trajectory_id: string;
  seq: number;
  ts: number;
  sessionId: string;
  tabId?: string;
  event: 'tool_call_start' | 'tool_call_end' | 'checkpoint' | 'contract' | 'hint' | 'recovery' | 'error';
  tool?: string;
  ok?: boolean;
  durationMs?: number;
  argsSummary?: Record<string, unknown>;
  resultSummary?: Record<string, unknown>;
  state?: {
    url?: string;
    title?: string;
    domTextHash?: string;
    screenshotHash?: string;
  };
  progress?: {
    status?: 'progressing' | 'stalling' | 'stuck';
    noProgressStreak?: number;
    rule?: string;
    severity?: 'info' | 'warning' | 'critical';
  };
  refs?: {
    beforeScreenshot?: string;
    afterScreenshot?: string;
    checkpoint?: string;
    contractEvidence?: string;
  };
}

Bounds:

  • argsSummary and resultSummary ≤ 4 KiB each after redaction.
  • At most two screenshots per tool call.
  • Screenshot capture is best-effort and must have a 5s timeout.
  • If the bundle writer fails, log once and continue the original tool call.

Start/stop behavior

Use the existing recording tool as the activation point. The PR must not add a separate trajectory lifecycle tool.

  • recording start accepts optional { "trajectoryBundle": true }.
  • If trajectoryBundle is absent or false, no trajectory bundle is written.
  • recording status includes { trajectoryBundle: { enabled, trajectory_id?, dir? } } when a bundle is active.
  • recording stop finalizes report.json before returning.

Redaction

Reuse the redaction rules from TaskJournal:

  • redact password/token/secret/credential/api-key style keys,
  • redact full args for sensitive tools such as cookies/http_auth,
  • never write raw password input values into events.

Report

report.json aggregates:

{
  trajectory_id,
  started_at,
  ended_at,
  total_events,
  tool_calls,
  failures,
  progress: { stalling_events, stuck_events },
  contracts: { pass, fail, inconclusive },
  artifacts: { events, screenshots, checkpoints, contracts }
}

Acceptance criteria

  • A trajectory bundle can be enabled and disabled without changing default recording behavior.
  • events.jsonl is append-only and has strictly increasing seq values.
  • Sensitive values are redacted in event summaries and report output.
  • Screenshot capture failures never fail the original MCP tool call.
  • Contract evidence from oc_assert is linked into contracts/<seq>.json when a bundle is active.
  • oc_checkpoint save writes or references a checkpoint artifact when a bundle is active.
  • report.json is generated on stop and includes counts for tools, failures, progress, and contracts.
  • No new mandatory native dependency.
  • Documentation added at docs/observability/trajectory-bundles.md.

Real verification after merge using OpenChrome

Setup

Start OpenChrome normally and connect a real Chrome session. Then run:

  1. Start recording/trajectory bundle.
  2. Navigate to https://example.com.
  3. Read page state.
  4. Run one passing oc_assert for h1 containing Example Domain.
  5. Save an oc_checkpoint with one completed and one pending step.
  6. Run one intentionally failing oc_assert for text NotPresent.
  7. Stop recording/trajectory bundle.

Scenario 1 — bundle files exist

Pass: the returned trajectory_id directory contains meta.json, events.jsonl, contracts/, checkpoints/, and report.json.

Scenario 2 — event sequence is valid

jq -r '.seq' ~/.openchrome/trajectories/<id>/events.jsonl | awk 'NR>1 && $1<=prev { exit 1 } { prev=$1 }'

Pass: command exits 0.

Scenario 3 — contract evidence is linked

Pass: report.json.contracts.pass === 1, report.json.contracts.fail === 1, and contracts/*.json contains both assertion results with evidence details.

Scenario 4 — checkpoint is linked

Pass: checkpoints/*.json includes the saved task description, completed steps, pending steps, current URL, and tab state.

Scenario 5 — redaction

On a local fixture page, type a password value super-secret-fixture-password, then stop the bundle.

! grep -R "super-secret-fixture-password" ~/.openchrome/trajectories/<id>

Pass: no matches.

Scenario 6 — default-off invariance

Run the same simple recording without enabling trajectory bundles.

Pass: no new ~/.openchrome/trajectories/<new-id> directory is created; existing recording report behavior remains unchanged.

Out of scope

Dependencies / references

Reviewer checklist for ambiguity

  • Is activation explicitly default-off?
  • Are event payload bounds concrete?
  • Are redaction rules testable?
  • Does the issue avoid changing existing tool semantics?

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: observability artifact packaging / replayable trajectory evidence.
  • Primary deliverable: a trajectory bundle that unifies existing journal, recording, checkpoint, contract, hint, and artifact references into one portable evidence directory/report.
  • Open PR: feat(observability): add recording trajectory bundles (#1059) #1117 (feat/1059-cua-adoption). Continue there; do not create a separate trajectory implementation.
  • Non-goal: changing action semantics, adding a new autonomous runner, requiring external storage, or turning every session into a heavy recording by default.

Overlap and conflict resolution

Implementation checklist

  • Define the trajectory bundle directory schema, including manifest, events/journal references, recording timeline, checkpoints, contract results, hint/progress snapshots, screenshots/artifacts, and redaction metadata.
  • Add an explicit opt-in activation path, such as a recording/session option, so default runtime overhead remains low.
  • Reuse existing journal/recording/checkpoint data instead of duplicating full payloads unless a stable reference is not enough.
  • Apply existing secret/redaction policy consistently to manifest, JSONL, reports, and copied artifacts.
  • Make screenshots/artifact copying best-effort and clearly mark missing artifacts rather than failing the whole bundle.
  • Add tests for enabled bundle export, disabled/default behavior, redaction, missing artifact handling, and manifest stability.

Success criteria

  • A completed opt-in run produces a deterministic bundle with enough references to reconstruct what happened without reading scattered internal files manually.
  • Bundle generation does not alter browser behavior or action ordering.
  • Disabled/default runs avoid material overhead and do not produce unexpected bundle directories.
  • The bundle is safe to attach to issues/PRs after redaction checks.

Post-merge OpenChrome live verification checklist

  • Run OpenChrome against a local fixture with trajectory bundle recording enabled; verify a bundle directory with manifest/report/events references is created.
  • Run the same fixture with bundle recording disabled; verify no unexpected heavy bundle is produced.
  • Inspect the bundle for redacted secrets and stable relative artifact paths.
  • Attach the manifest/report path and a concise artifact listing to merge verification notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestobservabilityObservabilityreliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions