Skip to content

feat(handoff): secret-safe human takeover checkpoints for resume #1040

@shaun0927

Description

@shaun0927

Why

Bytebot's most useful recovery pattern is not its virtual desktop itself; it is the ability to pause automation, let a human intervene, capture enough context, and resume the task. OpenChrome already supports real Chrome profiles, headed fallback, CAPTCHA/2FA handoff guidance, Ralph HITL escalation, and session persistence. What is missing is a structured, secret-safe human handoff checkpoint that records browser state deltas after manual intervention so the LLM does not wander or retry stale login flows.

What

Add an opt-in human handoff/resume checkpoint toolset:

  • oc_handoff_start
  • oc_handoff_status
  • oc_handoff_finish
  • oc_handoff_cancel

The handoff records state deltas, not raw user input. It supports these bounded cases through the reason enum: login, 2FA, CAPTCHA, permission, manual recovery, and other.

Proposed contract

export interface HandoffStartArgs {
  sessionId?: string;
  tabId?: string;
  run_id?: string;              // optional TaskRun link from the TaskRun issue
  reason: 'login' | '2fa' | 'captcha' | 'permission' | 'manual_recovery' | 'other';
  instruction?: string;         // redacted, <= 1 KiB
  timeout_ms?: number;          // default 10 min, max 60 min
}

export interface HandoffSnapshot {
  url: string;
  title?: string;
  origin?: string;
  timestamp: number;
  screenshot_ref?: string;      // optional, stored only if capture_screenshot=true
  cookie_count?: number;
  local_storage_keys?: string[]; // key names only, values never stored
  dom_fingerprint?: string;
}

export interface HandoffResult {
  handoff_id: string;
  status: 'RUNNING' | 'FINISHED' | 'CANCELLED' | 'TIMED_OUT';
  before: HandoffSnapshot;
  after?: HandoffSnapshot;
  delta_summary?: string;
  resume_hint?: string;
}

Implementation notes

  • Do not log key presses, typed text, password values, cookie values, localStorage values, or request bodies.
  • Capture only this safe state: URL/title/origin, counts, key names, DOM fingerprint/hash, optional screenshot reference.
  • If run_id is supplied, oc_handoff_finish appends exactly one handoff evidence pointer to that TaskRun.
  • Integrate Ralph HITL output so S7_HITL can recommend starting a handoff instead of leaving the agent with unstructured text only.
  • Handoff state is stored under ~/.openchrome/handoffs/<handoff_id>/ with atomic metadata writes.

Acceptance criteria

  • Four MCP tools registered and documented.
  • oc_handoff_start records a before snapshot and returns a handoff id plus a one-line instruction to the host that includes the reason and timeout deadline.
  • oc_handoff_finish records an after snapshot and produces a deterministic delta summary: URL changed, title changed, origin changed, cookie count changed, storage key set changed, DOM fingerprint changed.
  • Secret-safety tests prove cookie values, localStorage values, typed text, and password-like strings never appear in meta.json, events.jsonl, MCP response text, logs, or screenshots metadata.
  • Timeout test: handoff transitions to TIMED_OUT after configured timeout and can no longer be finished.
  • TaskRun integration test: if run_id is supplied, finishing handoff appends an evidence pointer to the TaskRun.
  • Ralph integration test: a forced S7_HITL result includes a machine-readable suggestion to call oc_handoff_start with reason manual_recovery.
  • Existing headed fallback/login behavior remains unchanged when no handoff is active.
  • npm run build && npm test green.

Real verification after merge using OpenChrome

  1. Launch OpenChrome headed with a persistent temporary profile.
  2. Navigate to https://the-internet.herokuapp.com/login.
  3. Call oc_task_run_start for a login/manual recovery verification run, or omit run_id if TaskRun is not merged yet.
  4. Call oc_handoff_start({reason:'login', instruction:'Manually complete the demo login', capture_screenshot:true}).
  5. Manually enter the demo credentials in Chrome and submit.
  6. Call oc_handoff_finish({handoff_id}).
  7. Verify the result reports URL/title/DOM fingerprint changed and contains no typed credential values.
  8. Call read_page and verify the page is authenticated/advanced enough for the next automated step.
  9. Restart OpenChrome and call oc_handoff_status({handoff_id}); verify the finished handoff metadata persists.
  10. If TaskRun is available, call oc_task_run_get and verify the handoff evidence pointer is attached.

Out of scope

  • Full user input recording or replay.
  • Password manager automation.
  • Solving CAPTCHA automatically.
  • New desktop/noVNC UI.
  • Storing cookie/storage values.

Dependencies

Success definition

Merge is successful when a manual login or recovery step can be represented as a safe, durable checkpoint that helps the agent resume without exposing secrets or changing normal OpenChrome tool behavior.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: human handoff checkpointing / secret-safe resume evidence.
  • Primary deliverable: opt-in oc_handoff_start/status/finish/cancel flow that records state deltas and resume context after human intervention.
  • Open PR: feat(handoff): add secret-safe resume checkpoints (#1040) #1110 (feat/1040-handoff-resume). Continue there; do not split a competing handoff toolset.
  • Non-goal: recording raw human input, storing credentials, automating login/2FA/CAPTCHA, or changing headed fallback behavior by default.

Overlap and conflict resolution

Implementation checklist

  • Implement opt-in handoff lifecycle tools with explicit reason enum, run/session/tab linkage, and cancel semantics.
  • Record state deltas and resume hints without raw keystrokes, passwords, OTPs, cookies, or screenshots that bypass redaction policy.
  • Add timeout/stale-state handling so unfinished handoffs are visible and cannot be mistaken for successful recovery.
  • Add unit/integration tests for start/status/finish/cancel, stale handoff, redaction, and TaskRun link behavior.
  • Document safe human takeover flow and what is intentionally not captured.

Success criteria

  • A human can pause automation, intervene, finish handoff, and provide enough safe context for the host to resume.
  • Secret material is not persisted in handoff records or verification artifacts.
  • Cancelled/stale handoffs are explicit and do not look like successful resume checkpoints.
  • Existing non-handoff sessions behave unchanged.

Post-merge OpenChrome live verification checklist

  • Start a local browser task, call oc_handoff_start with a non-secret reason, manually change a fixture page state, then call oc_handoff_finish.
  • Verify oc_handoff_status reports reason, state delta/resume hint, and linked run/session/tab without raw typed input.
  • Repeat with oc_handoff_cancel and verify no resume checkpoint is marked successful.
  • Inspect stored artifacts for secret redaction and include sanitized paths/results in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestobservabilityObservabilityreliabilityReliability and stability improvementsecuritySecurity vulnerability

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions