Skip to content

feat(obs): opt-in visual trajectory evidence bundles (OmniParser adoption D) #1055

@shaun0927

Description

@shaun0927

Tier: core / observability (opt-in artifact capture; privacy-safe defaults)
PR target: develop
Series: OmniParser adoption D
Priority: P1 — required to debug visual grounding, reduce wandering, and verify regressions after merge

Related / Sequencing

Background

OmniTool's trajectory logging is valuable because each step can be inspected with screenshot, parsed screen elements, model/action output, and timing. OpenChrome already has stronger runtime recovery, journaling, audit logging, and action recording, but lacks a single opt-in visual evidence bundle tying perception, action, outcome, and recovery together.

This issue adds privacy-aware trajectory/evidence bundles for visual/harness workflows. The goal is not to log everything by default; it is to make post-merge verification and failure analysis reproducible when explicitly enabled.

Proposed Implementation

Add an opt-in evidence bundle writer for browser action/perception steps.

Suggested module:

  • src/observability/visual-trajectory.ts
  • integration points in vision_find, Ralph visual strategy, interact, and optionally act
  • artifact root under ~/.openchrome/trajectories/ or configured path

Bundle shape

Each event should be JSONL and path-based for binary artifacts:

interface VisualTrajectoryEntry {
  version: 1;
  traceId: string;
  sessionId: string;
  tabId: string;
  url: string;
  timestamp: number;
  toolName: string;
  action?: {
    kind: string;
    target?: string;
    selectedElementId?: string;
    strategy?: string;
  };
  perception?: {
    provider: string;
    snapshotPath?: string;
    elementCount: number;
    latencyMs?: number;
    warnings: string[];
  };
  screenshots?: {
    beforePath?: string;
    annotatedPath?: string;
    afterPath?: string;
    phash?: string;
  };
  outcome: 'success' | 'failure' | 'skipped' | 'blocked' | 'unknown';
  recovery?: {
    hintRule?: string;
    ralphStrategy?: string;
    nextSuggestedTool?: string;
  };
  durationsMs: Record<string, number>;
  redaction: {
    inlineImages: false;
    secretsRedacted: true;
  };
}

Controls

  • Disabled by default.
  • Enable through env/config and/or explicit tool arg:
    • OPENCHROME_VISUAL_TRAJECTORY=1
    • OPENCHROME_VISUAL_TRAJECTORY_DIR=...
    • optional tool arg recordTrajectory: true
  • Store images as files, never inline base64 in JSONL.
  • Reuse existing redaction rules and screenshot payload guards.
  • Apply retention limits:
    • max entries per trace/session
    • max bytes per trajectory directory
    • best-effort cleanup of oldest artifacts

Non-goals

  • Do not record screenshots by default.
  • Do not store passwords, MFA codes, cookies, localStorage, or request bodies.
  • Do not add a replay engine in this issue.
  • Do not require external visual providers.

Acceptance Criteria

  • Visual trajectory capture is disabled by default.
  • When enabled, vision_find and Ralph visual fallback can emit JSONL entries with path-based screenshot/snapshot artifacts.
  • JSONL entries include provider, element count, timing, action/strategy, outcome, and warnings.
  • No inline base64 images are written to JSONL.
  • Secret fixture values are redacted/absent from JSONL and snapshot artifacts.
  • Retention limits are enforced or at least configurable with tested cleanup behavior.
  • Capture failures are non-fatal and reported as warnings/metrics.
  • Tests cover disabled default, enabled capture, redaction, path-only images, retention, and non-fatal write errors.
  • npm run build && npm test -- --runInBand trajectory observability vision pass, plus full npm run build && npm test && npm run lint:tier before PR completion.

Verification (post-merge, via OpenChrome MCP)

Record artifacts under scripts/verify/omniparser-adoption-D-visual-trajectory/ and use an isolated trajectory directory.

Setup

npm ci
npm run build
mkdir -p scripts/verify/omniparser-adoption-D-visual-trajectory
TRAJ_DIR=$(pwd)/scripts/verify/omniparser-adoption-D-visual-trajectory/runtime-artifacts
rm -rf "$TRAJ_DIR" && mkdir -p "$TRAJ_DIR"
node tests/fixtures/sites/vision-perception/serve.mjs &
FIX_PID=$!
PORT=9895
OPENCHROME_VISUAL_TRAJECTORY=1 \
OPENCHROME_VISUAL_TRAJECTORY_DIR="$TRAJ_DIR" \
node dist/index.js --http "$PORT" > /tmp/openchrome-visual-trajectory.log 2>&1 &
OC_PID=$!
sleep 1
mcp() { curl -s -H 'content-type: application/json' -d "$1" "http://localhost:$PORT/mcp"; }

Scenario 1 — enabled capture writes bounded JSONL and files

mcp '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"navigate","arguments":{"url":"http://localhost:9991/perception.html"}}}' >/tmp/oc-nav-D.json
TAB=$(jq -r '.result.content[0].text | fromjson | .tabId' /tmp/oc-nav-D.json)
mcp "$(jq -nc --arg tab "$TAB" '{jsonrpc:"2.0",id:2,method:"tools/call",params:{name:"vision_find",arguments:{tabId:$tab,format:"both",recordTrajectory:true}}}')" \
  | tee scripts/verify/omniparser-adoption-D-visual-trajectory/vision-find-response.json
find "$TRAJ_DIR" -type f | sort | tee scripts/verify/omniparser-adoption-D-visual-trajectory/files.txt
grep -R '"version":1' "$TRAJ_DIR"/*.jsonl >/dev/null
jq -e 'select(.toolName == "vision_find") | .perception.elementCount >= 1 and .redaction.inlineImages == false' "$TRAJ_DIR"/*.jsonl >/dev/null

Pass: trajectory JSONL exists, references perception data, and does not inline images.

Scenario 2 — secrets are absent

! grep -R "super-secret-fixture-password" "$TRAJ_DIR" scripts/verify/omniparser-adoption-D-visual-trajectory/*.json

Pass: fixture secret is absent from all trajectory artifacts and recorded responses.

Scenario 3 — disabled default writes nothing

kill $OC_PID
wait $OC_PID 2>/dev/null || true
rm -rf "$TRAJ_DIR" && mkdir -p "$TRAJ_DIR"
PORT=9896
OPENCHROME_VISUAL_TRAJECTORY_DIR="$TRAJ_DIR" node dist/index.js --http "$PORT" > /tmp/openchrome-visual-trajectory-disabled.log 2>&1 &
OC_PID=$!
sleep 1
curl -s -H 'content-type: application/json' -d "$(jq -nc --arg tab "$TAB" '{jsonrpc:"2.0",id:3,method:"tools/call",params:{name:"vision_find",arguments:{tabId:$tab,format:"both"}}}')" "http://localhost:$PORT/mcp" >/dev/null
COUNT=$(find "$TRAJ_DIR" -type f | wc -l | tr -d ' ')
[ "$COUNT" = "0" ]

Pass: no trajectory artifacts are written unless capture is enabled.

Cleanup

kill $FIX_PID $OC_PID
wait $FIX_PID $OC_PID 2>/dev/null || true

Directionality / Fit Check

This strengthens OpenChrome's harness and verification story without adding agent behavior. It is observability-only, opt-in, and explicitly privacy bounded.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: visual observability bundles.
  • Primary deliverable: opt-in visual trajectory evidence bundles for grounding/debug verification.
  • Open PR: none currently linked; create a new PR only after checking for newer overlapping PRs.
  • Non-goal: privacy-invasive default screenshot capture, runtime decision changes, or replacing general trajectory bundles.

Overlap and conflict resolution

Implementation checklist

  • Define opt-in capture of screenshot, perception snapshot, selected target, confidence, action result, and redaction metadata.
  • Keep capture disabled by default and bounded by size/count.
  • Apply privacy/redaction rules and allow missing artifacts to be recorded explicitly.
  • Add tests for enabled/disabled capture, redaction, missing screenshot/provider data, and manifest stability.
  • Document safe artifact handling for PR/issue debugging.

Success criteria

  • Visual debugging evidence is available only when explicitly enabled.
  • Artifacts link perception to action outcome and wandering evidence.
  • No sensitive visual data is captured by default.
  • Benchmark/harness can consume bundles deterministically.

Post-merge OpenChrome live verification checklist

  • Run a local visual fixture with evidence enabled and verify bundle contains screenshot/perception/action metadata.
  • Run with evidence disabled and verify no visual bundle.
  • Inspect bundle for redaction and bounded artifact count.
  • Attach sanitized manifest and artifact listing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationlive-verificationRequires live OpenChrome/browser validation after implementationobservabilityObservabilityreliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions