feat(harness): task envelope budgets to bound browser-agent wandering

## Why

The Goose comparison identified a gap that is not covered by OpenChrome's existing browser-level resilience: OpenChrome can keep Chrome/CDP healthy, but it does not yet expose a **task-level harness envelope** that bounds a host agent's browser work across many tool calls.

This is intentionally **not** an agent loop inside OpenChrome. The server must continue to satisfy the portability-harness contract:

- no server-side LLM calls
- no autonomous task planning
- no change to existing tool behavior unless the caller opts into a task envelope
- facts and guardrails only; the host agent still decides what to do next

Related work:
- Builds on #855 (`oc_task_ledger`) for durable task records.
- Complements #869 (`notifications/progress`) for long-running status.
- Complements #893 (state headers) for easier task-state interpretation.

## Scope

Add an opt-in task envelope that lets a host agent declare objective, budgets, and policy constraints for a browser task, then lets OpenChrome record per-tool progress and return deterministic budget/wandering signals.

### New/updated tool surface

Add or extend task tools from #855 with these fields. If #855 has not landed yet, implement behind the same task-ledger storage module rather than adding a second ledger.

```ts
export interface TaskEnvelopePolicy {
  maxToolCalls?: number;              // default: unset; hard upper bound when set
  maxWallMs?: number;                  // default: unset
  maxConsecutiveSameTool?: number;     // default: 5
  maxObservationStreak?: number;       // default: 6; read_page/find/tabs_context/screenshot only
  maxFailureStreak?: number;           // default: 4
  maxSameUrlNavigations?: number;      // default: 3 per URL within task
  allowedDomains?: string[];           // optional additive narrowing over existing domain guard
  checkpointEveryCalls?: number;       // optional; used by the follow-up checkpoint issue
}

export interface TaskEnvelope {
  task_id: string;
  objective: string;
  phase: 'explore' | 'act' | 'verify' | 'recover' | 'done';
  policy: TaskEnvelopePolicy;
}
```

Minimum tool operations:

1. `oc_task_start` accepts `objective`, optional `policy`, optional initial `phase`.
2. Every OpenChrome tool call may accept optional `taskId`. When present, the call is recorded against that task.
3. `oc_task_get` returns:
   - current counters
   - latest phase
   - budget status
   - last 10 meaningful events
   - deterministic `recommended_next` when a budget is near/exceeded
4. `oc_task_update` updates `phase` and optional notes without executing browser actions.
5. `oc_task_finish` closes the envelope with `completed | failed | cancelled` and final note.

## Non-goals

- Do not add an LLM planner, prompt generator, or automatic browser action executor.
- Do not block normal tools when `taskId` is absent.
- Do not replace `HintEngine`, `ProgressTracker`, Ralph, or circuit breakers; aggregate their signals at task level.
- Do not introduce new native dependencies or network services.

## Implementation notes

- Store task envelopes under the task ledger root from #855, or `~/.openchrome/tasks/` if implemented first.
- Use atomic JSON/JSONL writes and existing `writeFileAtomicSafe` / lock helpers.
- Add a small shared helper, for example `src/core/task-envelope/budget.ts`, that receives a normalized `ToolCallEvent` and returns budget transitions.
- Observation tool classification must be table-driven and tested. Start with `read_page`, `find`, `tabs_context`, `page_screenshot`, `computer` with screenshot action only.
- Budget exceedance should return a structured warning/error in the task state, not kill the MCP server or Chrome.

## Acceptance criteria

- [x] `oc_task_start/get/update/finish` are available and documented.
- [x] At least `navigate`, `read_page`, `find`, `interact`, `act`, `javascript_tool`, `page_screenshot`, and `tabs_context` record to the envelope when `taskId` is provided.
- [x] Counters distinguish action calls from observation calls.
- [x] Repeating the same observation tool beyond `maxObservationStreak` produces `budget_status: 'exceeded'` and `recommended_next: 'change_strategy_or_verify'`.
- [x] Repeating the same navigation URL beyond `maxSameUrlNavigations` produces a task-level warning.
- [x] With no `taskId`, existing tool outputs remain byte-compatible except for already-approved global metadata changes from other issues.
- [x] Unit tests cover each budget type and the absent-`taskId` no-op path.
- [x] `npm run build`, targeted Jest tests, and `npm run lint:tier` pass.


## Self-review checklist for implementer

- [x] The feature is deterministic and does not make LLM/provider calls.
- [x] The feature narrows wandering; it does not create a second autonomous orchestrator.
- [x] The task envelope composes with #855 rather than creating duplicate task storage.
- [x] The live verification proves actual OpenChrome behavior, not only unit-test mocks.



## OpenChrome 실검증 체크리스트

> 2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

### 검증 대상
- **이슈:** #1034 — feat(harness): task envelope budgets to bound browser-agent wandering
- **적용 버전:** origin/develop @ f1facb8f (f1facb8f6a7b84756fba1dcdb8fa9b7e9a85293a), package 1.11.0
- **로컬 fixture:** http://127.0.0.1:18765/smoke.html
- **주요 OpenChrome 표면:** (surface 없음)
- **판정:** VERIFIED — 최신 develop에서 구현/테스트/실행 증거가 모두 확인되어 close 가능

### 최신 버전/공통 런타임 검증
- [x] 최신 develop 소스를 적용하고 `npm run build` 통과를 확인했다.
- [x] `npm run lint:tier` 통과를 확인했다.
- [x] `npm test -- --runInBand` 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
- [x] `oc_connection_health`가 connected 상태를 반환했다.
- [x] 로컬 fixture에서 OpenChrome `navigate/read_page/interact/javascript_tool` 경로로 DOM 상태 변화를 관찰했다.
- [x] 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

### 이슈별 해결 증거
- [x] 최신 develop에 연결된 구현 PR: 1082
- [x] 관련 테스트/소스 증거가 최신 트리에 존재한다:
  - src/core/task-ledger/types.ts
  - src/mcp-server.ts
  - src/tools/index.ts
  - src/tools/orchestration.ts
  - src/pilot/dynamic-skills/attachment-defaults.ts
  - src/pilot/dynamic-skills/replay.ts
- [x] 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

### 실패/보류 기준
- 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
- 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): task envelope budgets to bound browser-agent wandering #1034

Why

Scope

New/updated tool surface

Non-goals

Implementation notes

Acceptance criteria

Self-review checklist for implementer

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(harness): task envelope budgets to bound browser-agent wandering #1034

Description

Why

Scope

New/updated tool surface

Non-goals

Implementation notes

Acceptance criteria

Self-review checklist for implementer

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions