You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Goose comparison identified a gap that is not covered by OpenChrome's existing browser-level resilience: OpenChrome can keep Chrome/CDP healthy, but it does not yet expose a task-level harness envelope that bounds a host agent's browser work across many tool calls.
This is intentionally not an agent loop inside OpenChrome. The server must continue to satisfy the portability-harness contract:
no server-side LLM calls
no autonomous task planning
no change to existing tool behavior unless the caller opts into a task envelope
facts and guardrails only; the host agent still decides what to do next
Add an opt-in task envelope that lets a host agent declare objective, budgets, and policy constraints for a browser task, then lets OpenChrome record per-tool progress and return deterministic budget/wandering signals.
New/updated tool surface
Add or extend task tools from #855 with these fields. If #855 has not landed yet, implement behind the same task-ledger storage module rather than adding a second ledger.
exportinterfaceTaskEnvelopePolicy{maxToolCalls?: number;// default: unset; hard upper bound when setmaxWallMs?: number;// default: unsetmaxConsecutiveSameTool?: number;// default: 5maxObservationStreak?: number;// default: 6; read_page/find/tabs_context/screenshot onlymaxFailureStreak?: number;// default: 4maxSameUrlNavigations?: number;// default: 3 per URL within taskallowedDomains?: string[];// optional additive narrowing over existing domain guardcheckpointEveryCalls?: number;// optional; used by the follow-up checkpoint issue}exportinterfaceTaskEnvelope{task_id: string;objective: string;phase: 'explore'|'act'|'verify'|'recover'|'done';policy: TaskEnvelopePolicy;}
Use atomic JSON/JSONL writes and existing writeFileAtomicSafe / lock helpers.
Add a small shared helper, for example src/core/task-envelope/budget.ts, that receives a normalized ToolCallEvent and returns budget transitions.
Observation tool classification must be table-driven and tested. Start with read_page, find, tabs_context, page_screenshot, computer with screenshot action only.
Budget exceedance should return a structured warning/error in the task state, not kill the MCP server or Chrome.
Acceptance criteria
oc_task_start/get/update/finish are available and documented.
At least navigate, read_page, find, interact, act, javascript_tool, page_screenshot, and tabs_context record to the envelope when taskId is provided.
Counters distinguish action calls from observation calls.
Repeating the same observation tool beyond maxObservationStreak produces budget_status: 'exceeded' and recommended_next: 'change_strategy_or_verify'.
Repeating the same navigation URL beyond maxSameUrlNavigations produces a task-level warning.
With no taskId, existing tool outputs remain byte-compatible except for already-approved global metadata changes from other issues.
Unit tests cover each budget type and the absent-taskId no-op path.
npm run build, targeted Jest tests, and npm run lint:tier pass.
Self-review checklist for implementer
The feature is deterministic and does not make LLM/provider calls.
The feature narrows wandering; it does not create a second autonomous orchestrator.
Why
The Goose comparison identified a gap that is not covered by OpenChrome's existing browser-level resilience: OpenChrome can keep Chrome/CDP healthy, but it does not yet expose a task-level harness envelope that bounds a host agent's browser work across many tool calls.
This is intentionally not an agent loop inside OpenChrome. The server must continue to satisfy the portability-harness contract:
Related work:
oc_task_ledger) for durable task records.notifications/progress) for long-running status.Scope
Add an opt-in task envelope that lets a host agent declare objective, budgets, and policy constraints for a browser task, then lets OpenChrome record per-tool progress and return deterministic budget/wandering signals.
New/updated tool surface
Add or extend task tools from #855 with these fields. If #855 has not landed yet, implement behind the same task-ledger storage module rather than adding a second ledger.
Minimum tool operations:
oc_task_startacceptsobjective, optionalpolicy, optional initialphase.taskId. When present, the call is recorded against that task.oc_task_getreturns:recommended_nextwhen a budget is near/exceededoc_task_updateupdatesphaseand optional notes without executing browser actions.oc_task_finishcloses the envelope withcompleted | failed | cancelledand final note.Non-goals
taskIdis absent.HintEngine,ProgressTracker, Ralph, or circuit breakers; aggregate their signals at task level.Implementation notes
~/.openchrome/tasks/if implemented first.writeFileAtomicSafe/ lock helpers.src/core/task-envelope/budget.ts, that receives a normalizedToolCallEventand returns budget transitions.read_page,find,tabs_context,page_screenshot,computerwith screenshot action only.Acceptance criteria
oc_task_start/get/update/finishare available and documented.navigate,read_page,find,interact,act,javascript_tool,page_screenshot, andtabs_contextrecord to the envelope whentaskIdis provided.maxObservationStreakproducesbudget_status: 'exceeded'andrecommended_next: 'change_strategy_or_verify'.maxSameUrlNavigationsproduces a task-level warning.taskId, existing tool outputs remain byte-compatible except for already-approved global metadata changes from other issues.taskIdno-op path.npm run build, targeted Jest tests, andnpm run lint:tierpass.Self-review checklist for implementer
OpenChrome 실검증 체크리스트
검증 대상
최신 버전/공통 런타임 검증
npm run build통과를 확인했다.npm run lint:tier통과를 확인했다.npm test -- --runInBand결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.oc_connection_health가 connected 상태를 반환했다.navigate/read_page/interact/javascript_tool경로로 DOM 상태 변화를 관찰했다.이슈별 해결 증거
실패/보류 기준