Skip to content

feat(task-run): bulk progress contract to prevent premature completion #1041

@shaun0927

Description

@shaun0927

Why

The Bytebot analysis highlighted a simple but important long-running-task discipline: for repetitive work, track total items, completed items, failed items, cursor, and explicit stop conditions. OpenChrome already detects low-level stalling via ProgressTracker and workflow stale updates, but it can still let an LLM prematurely declare success after processing only a subset of a requested list. This issue adds an opt-in bulk progress contract that reduces wandering and premature completion for crawl, pagination, multi-site, and workflow tasks.

What

Add a contract object and helper APIs that TaskRun/workflow/batch tools can use to track repetitive progress:

  • expected item count or unknown-count cursor
  • completed item ids
  • failed item ids and reasons
  • stop condition
  • minimum completion threshold
  • finalization guard that rejects completion if criteria are unmet

This is a generic progress guard only; it must not add scheduling, browser control, worker orchestration, or another workflow engine.

Proposed contract

export interface BulkProgressContract {
  contract_id: string;
  run_id?: string;
  scope: 'task_run' | 'workflow' | 'batch' | 'crawl';
  expected_total?: number;
  min_completed?: number;
  stop_condition: string;       // e.g. "no next page", "processed all input urls"
  item_key: string;             // e.g. "url", "row_id", "profile_id"
  cursor?: string;
  completed: string[];
  failed: Array<{ item: string; reason: string; retryable?: boolean }>;
  last_progress_at: number;
  created_at: number;
  updated_at: number;
}

export interface CompletionGuardResult {
  allowed: boolean;
  reason?: string;
  missing_count?: number;
  failed_count?: number;
  suggested_next_action?: string;
}

Implementation notes

  • Store the contract as part of TaskRun metadata when run_id exists, or as a standalone JSON file under ~/.openchrome/progress-contracts/.
  • Provide internal helper functions for recordCompleted, recordFailed, updateCursor, and checkCompletionGuard.
  • Wire initially into TaskRun completion and one representative long-running tool path (crawl or workflow_collect) to keep scope bounded.
  • HintEngine consumes CompletionGuardResult from TaskRun completion attempts and emits a warning when completion is attempted too early.

Acceptance criteria

  • BulkProgressContract type and storage/helper module implemented with unit tests.
  • Completion guard blocks completion when expected_total is known and completed.length + failed.length < expected_total.
  • Completion guard blocks completion when min_completed is unmet.
  • Unknown-total mode allows completion only when a non-empty stop_condition is explicitly marked satisfied.
  • Failed items are retained in the final result and do not count as completed.
  • Bounded storage: completed/failed arrays are capped with truncation metadata for very large runs.
  • TaskRun integration: oc_task_run_complete returns a typed guard failure if the active bulk contract is incomplete, unless force:true and a reason are supplied.
  • HintEngine integration: an attempted premature completion produces a warning-level hint with the missing count and suggested next action.
  • Regression: normal short tasks without a bulk contract are unaffected.
  • npm run build && npm test green.

Real verification after merge using OpenChrome

  1. Start a TaskRun with goal Visit three URLs and collect their titles and a bulk contract with expected_total: 3, item_key: 'url', and the three URLs: https://example.com, https://news.ycombinator.com/, https://www.iana.org/domains/reserved.
  2. Visit only the first URL using navigate and read_page.
  3. Record one completed item through oc_task_run_update or the new progress helper.
  4. Attempt oc_task_run_complete.
  5. Verify completion is rejected with missing_count: 2 and a suggested next action.
  6. Visit the remaining two URLs, record both completed items, and call oc_task_run_complete again.
  7. Verify completion succeeds and the final result includes all three completed item ids.
  8. Repeat with one URL deliberately failed; verify completion can succeed only when the failed item is recorded with reason and the success criteria allow partial completion or force reason is supplied.
  9. Confirm oc_task_run_get after restart still shows completed/failed items and cursor.

Out of scope

  • Automatic discovery of all items on arbitrary pages.
  • LLM-based evaluation of whether an item is complete.
  • Changing default behavior for tasks that do not opt into a bulk contract.
  • Replacing workflow stale circuit breaker.

Dependencies

Success definition

Merge is successful when OpenChrome can prevent an opt-in repetitive task from being marked complete before its declared item-level progress criteria are satisfied, with machine-readable recovery guidance for the LLM.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: opt-in bulk progress completion guard for repetitive TaskRun work.
  • Primary deliverable: a bulk progress contract that tracks total/completed/failed/skipped/cursor state and prevents premature completion when required items remain.
  • Open PR: feat(task-run): add bulk progress completion guard (#1041) #1116 (feat/1041-bulk-progress-contract). Amend that PR first.
  • Non-goal: general workflow engine, scheduler, benchmark runner, or new browser automation strategy.

Overlap and conflict resolution

Implementation checklist

  • Define an opt-in bulk progress contract with expected total or item list, current cursor, completed count, failed/skipped count, and explicit completion condition.
  • Integrate with TaskRun/outcome-contract completion so success cannot be reported while required items remain incomplete.
  • Provide clear diagnostics showing which counts/items prevent completion.
  • Add tests for all-complete, partial-complete blocked, failed-item policy, cursor resume, and malformed contract input.
  • Keep output compact and avoid dumping full item lists unless explicitly requested or necessary for diagnostics.

Success criteria

  • A task with expected_total=3 cannot pass after processing only 1 or 2 required items.
  • A fully completed task can pass with concise evidence of counts and stop condition.
  • Failure/skipped policy is explicit and tested rather than inferred from prompt wording.
  • Existing non-bulk tasks continue to behave as before unless they opt into the contract.

Post-merge OpenChrome live verification checklist

  • Run a local fixture task with expected_total=3 and intentionally complete only one item; verify completion is blocked with a clear diagnostic.
  • Complete all three items and verify the task can finish with counts recorded.
  • Restart/resume from a mid-list cursor if supported by the implementation and verify counts remain consistent.
  • Record the command/tool calls, blocked diagnostic, final success evidence, and any TaskRun artifact path in merge verification notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestoutcome-contractsVerifiable execution via pre/post-condition contracts (Q2)reliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions