feat(task-run): bulk progress contract to prevent premature completion

## Why

The Bytebot analysis highlighted a simple but important long-running-task discipline: for repetitive work, track total items, completed items, failed items, cursor, and explicit stop conditions. OpenChrome already detects low-level stalling via ProgressTracker and workflow stale updates, but it can still let an LLM prematurely declare success after processing only a subset of a requested list. This issue adds an opt-in **bulk progress contract** that reduces wandering and premature completion for crawl, pagination, multi-site, and workflow tasks.

## What

Add a contract object and helper APIs that TaskRun/workflow/batch tools can use to track repetitive progress:

- expected item count or unknown-count cursor
- completed item ids
- failed item ids and reasons
- stop condition
- minimum completion threshold
- finalization guard that rejects completion if criteria are unmet

This is a generic progress guard only; it must not add scheduling, browser control, worker orchestration, or another workflow engine.

## Proposed contract

```ts
export interface BulkProgressContract {
  contract_id: string;
  run_id?: string;
  scope: 'task_run' | 'workflow' | 'batch' | 'crawl';
  expected_total?: number;
  min_completed?: number;
  stop_condition: string;       // e.g. "no next page", "processed all input urls"
  item_key: string;             // e.g. "url", "row_id", "profile_id"
  cursor?: string;
  completed: string[];
  failed: Array<{ item: string; reason: string; retryable?: boolean }>;
  last_progress_at: number;
  created_at: number;
  updated_at: number;
}

export interface CompletionGuardResult {
  allowed: boolean;
  reason?: string;
  missing_count?: number;
  failed_count?: number;
  suggested_next_action?: string;
}
```

## Implementation notes

- Store the contract as part of TaskRun metadata when `run_id` exists, or as a standalone JSON file under `~/.openchrome/progress-contracts/`.
- Provide internal helper functions for `recordCompleted`, `recordFailed`, `updateCursor`, and `checkCompletionGuard`.
- Wire initially into TaskRun completion and one representative long-running tool path (`crawl` or `workflow_collect`) to keep scope bounded.
- HintEngine consumes `CompletionGuardResult` from TaskRun completion attempts and emits a warning when completion is attempted too early.

## Acceptance criteria

- [ ] `BulkProgressContract` type and storage/helper module implemented with unit tests.
- [ ] Completion guard blocks completion when `expected_total` is known and `completed.length + failed.length < expected_total`.
- [ ] Completion guard blocks completion when `min_completed` is unmet.
- [ ] Unknown-total mode allows completion only when a non-empty `stop_condition` is explicitly marked satisfied.
- [ ] Failed items are retained in the final result and do not count as completed.
- [ ] Bounded storage: completed/failed arrays are capped with truncation metadata for very large runs.
- [ ] TaskRun integration: `oc_task_run_complete` returns a typed guard failure if the active bulk contract is incomplete, unless `force:true` and a reason are supplied.
- [ ] HintEngine integration: an attempted premature completion produces a warning-level hint with the missing count and suggested next action.
- [ ] Regression: normal short tasks without a bulk contract are unaffected.
- [ ] `npm run build && npm test` green.

## Real verification after merge using OpenChrome

1. Start a TaskRun with goal `Visit three URLs and collect their titles` and a bulk contract with `expected_total: 3`, `item_key: 'url'`, and the three URLs: `https://example.com`, `https://news.ycombinator.com/`, `https://www.iana.org/domains/reserved`.
2. Visit only the first URL using `navigate` and `read_page`.
3. Record one completed item through `oc_task_run_update` or the new progress helper.
4. Attempt `oc_task_run_complete`.
5. Verify completion is rejected with `missing_count: 2` and a suggested next action.
6. Visit the remaining two URLs, record both completed items, and call `oc_task_run_complete` again.
7. Verify completion succeeds and the final result includes all three completed item ids.
8. Repeat with one URL deliberately failed; verify completion can succeed only when the failed item is recorded with reason and the success criteria allow partial completion or `force` reason is supplied.
9. Confirm `oc_task_run_get` after restart still shows completed/failed items and cursor.

## Out of scope

- Automatic discovery of all items on arbitrary pages.
- LLM-based evaluation of whether an item is complete.
- Changing default behavior for tasks that do not opt into a bulk contract.
- Replacing workflow stale circuit breaker.

## Dependencies

- Strongly benefits from #1039.
- Can be implemented standalone for workflow/crawl if TaskRun is not yet merged.

## Success definition

Merge is successful when OpenChrome can prevent an opt-in repetitive task from being marked complete before its declared item-level progress criteria are satisfied, with machine-readable recovery guidance for the LLM.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** opt-in bulk progress completion guard for repetitive TaskRun work.
- **Primary deliverable:** a bulk progress contract that tracks total/completed/failed/skipped/cursor state and prevents premature completion when required items remain.
- **Open PR:** #1116 (`feat/1041-bulk-progress-contract`). Amend that PR first.
- **Non-goal:** general workflow engine, scheduler, benchmark runner, or new browser automation strategy.

### Overlap and conflict resolution
- [ ] Keep separate from #1049: `BrowserTaskSignature` may describe bulk progress requirements, but this issue implements the runtime progress guard.
- [ ] Keep separate from #1060: progress diagnostics can report bulk status, but this issue owns completion blocking for incomplete bulk work.
- [ ] Keep separate from #1058/#1047: benchmark/certification can exercise the guard, but scoring and scenario running are out of scope.
- [ ] Keep separate from #855/#1039 unless shared TaskRun fields must be reused; avoid duplicating task-ledger state.

### Implementation checklist
- [ ] Define an opt-in bulk progress contract with expected total or item list, current cursor, completed count, failed/skipped count, and explicit completion condition.
- [ ] Integrate with TaskRun/outcome-contract completion so success cannot be reported while required items remain incomplete.
- [ ] Provide clear diagnostics showing which counts/items prevent completion.
- [ ] Add tests for all-complete, partial-complete blocked, failed-item policy, cursor resume, and malformed contract input.
- [ ] Keep output compact and avoid dumping full item lists unless explicitly requested or necessary for diagnostics.

### Success criteria
- [ ] A task with expected_total=3 cannot pass after processing only 1 or 2 required items.
- [ ] A fully completed task can pass with concise evidence of counts and stop condition.
- [ ] Failure/skipped policy is explicit and tested rather than inferred from prompt wording.
- [ ] Existing non-bulk tasks continue to behave as before unless they opt into the contract.

### Post-merge OpenChrome live verification checklist
- [ ] Run a local fixture task with `expected_total=3` and intentionally complete only one item; verify completion is blocked with a clear diagnostic.
- [ ] Complete all three items and verify the task can finish with counts recorded.
- [ ] Restart/resume from a mid-list cursor if supported by the implementation and verify counts remain consistent.
- [ ] Record the command/tool calls, blocked diagnostic, final success evidence, and any TaskRun artifact path in merge verification notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(task-run): bulk progress contract to prevent premature completion #1041

Why

What

Proposed contract

Implementation notes

Acceptance criteria

Real verification after merge using OpenChrome

Out of scope

Dependencies

Success definition

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(task-run): bulk progress contract to prevent premature completion #1041

Description

Why

What

Proposed contract

Implementation notes

Acceptance criteria

Real verification after merge using OpenChrome

Out of scope

Dependencies

Success definition

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions