feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4)

## Background
browser-use exposes token/cost tracking for LLM calls. OpenChrome usually cannot observe the MCP host's LLM token accounting, but it can measure the payloads it emits and the browser/CDP work required to produce them. That is enough to prevent regressions in speed and context size.

Related existing work: #846 (performance insights), #869/#870 (notifications/progress/logging), #897 (byte-aware console buffer). This issue is narrower: **payload budget telemetry for browser tool responses**.

## Goal
Add lightweight payload and latency telemetry to high-frequency tools so optimization claims can be verified after merge.

## Scope
For at least `read_page`, `inspect`, `query_dom`, `extract_data`, `crawl`, and screenshot-producing tools, expose debug/metadata fields or logs for:

- `outputChars`
- `estimatedOutputTokens` using a documented heuristic such as `ceil(chars / 4)`
- `toolLatencyMs`
- `browserLatencyMs` or CDP/evaluate timing when available
- `domNodesVisited`
- `domNodesEmitted`
- `compressionRatio` where applicable
- `deltaSavedChars` where snapshot delta applies
- `screenshotBytes` / `screenshotFormat` where applicable
- `artifactBytes` if output handles are used

Required behavior:
- Keep telemetry compact by default.
- Expose detailed timing only with `debug`/diagnostic mode or through logs/metrics.
- Do not include sensitive page content in metrics.
- Add a small benchmark or test fixture that fails if compact output regresses dramatically without an explicit snapshot update.

## Non-goals
- Do not claim exact LLM billing unless the MCP host provides usage data.
- Do not fetch external pricing data in core server code.
- Do not add a heavy observability dependency.

## Implementation notes
- Reuse existing metrics collector where possible.
- Prefer per-tool `meta` or `metrics` object over embedding prose.
- If MCP `structuredContent` work (#871) lands first, place metrics in structured content.

## Success criteria
- Tool responses or logs expose enough data to compare compact vs normal output size.
- Tests assert metrics are present for at least two high-frequency tools.
- No sensitive text is copied into telemetry fields.
- Documentation explains that `estimatedOutputTokens` is an approximation, not provider billing.

## Real OpenChrome validation after implementation
Use real Chrome through OpenChrome.

1. Start OpenChrome:
   ```bash
   npm run build
   node dist/cli/index.js serve
   ```
2. In an MCP client:
   - `navigate` to `https://github.com/browser-use/browser-use`
   - run `read_page` in normal/full mode
   - run `read_page` in compact mode if available
   - run `inspect` with a focused query
   - run `extract_data` for repo metadata
3. Capture evidence:
   - `outputChars` and `estimatedOutputTokens` for each call
   - `toolLatencyMs`
   - `domNodesVisited` and `domNodesEmitted` for DOM tools
   - screenshot bytes if screenshot is part of validation
4. Pass condition:
   - metrics are present and internally consistent (`estimatedOutputTokens ~= ceil(outputChars / 4)`).
   - compact/focused tools show lower output size than full read on the same page.
   - metrics contain no obvious raw page secrets from a local fixture with password/token fields.

## Review checklist
- [ ] Measures payload, not speculative provider billing.
- [ ] Works without network pricing lookup.
- [ ] Does not add sensitive content to logs.
- [ ] Enables future benchmark gates.

## Self-review clarifications (added before implementation)
- Metrics should live under a structured `metrics` object where the response format permits it; avoid prose-only metrics.
- `estimatedOutputTokens` must be computed from the serialized response body that the MCP host receives, not from internal pre-redaction content.
- When a metric is not applicable, omit it or set it to `null`; do not use misleading zero values.
- The first PR should cover `read_page` plus one of `inspect`/`extract_data`; screenshots/crawl can follow if needed.
- Any benchmark threshold must be fixture-based and updateable through an intentional snapshot process, not dependent on a changing public website.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** browser output telemetry.
- **Primary deliverable:** payload and latency telemetry for high-frequency browser tool responses.
- **Open PR:** #1100 (`feat/981-inspect-metrics`). Continue there; do not duplicate the PR.
- **Non-goal:** MCP host LLM cost accounting, exact tokenizer billing, changing default content, or replacing notification/progress systems.

### Overlap and conflict resolution
- [ ] Coordinate with #990 token/compression metrics; this issue is broader payload budget telemetry across tools.
- [ ] Complement #846/#869/#870/#897 without duplicating their insights/notifications/logging buffers.
- [ ] Keep telemetry metadata bounded and opt-in/debug where needed.

### Implementation checklist
- [ ] Expose fields such as outputChars, estimatedOutputTokens, latencyMs, screenshot bytes/path, truncation, compression, cache status, and warning thresholds for target tools.
- [ ] Instrument at least `read_page`, `inspect`, `query_dom`, `extract_data`, `crawl`, and screenshot-producing tools as scoped or document staged coverage.
- [ ] Use local estimates only and avoid external tokenizer/API calls.
- [ ] Add tests for metric presence, bounds, latency field shape, disabled/default behavior, and no payload-content regression.
- [ ] Document telemetry interpretation and regression workflow.

### Success criteria
- [ ] Maintainers can verify payload/latency regression claims after merge.
- [ ] Telemetry does not materially increase default response size.
- [ ] Metrics are deterministic enough for tests where applicable.
- [ ] No external token/cost service is required.

### Post-merge OpenChrome live verification checklist
- [ ] Run representative local calls for read_page/inspect/extract/crawl/screenshot and verify telemetry fields.
- [ ] Compare enabled vs disabled/default response size.
- [ ] Verify threshold warnings on a large fixture output.
- [ ] Include sanitized telemetry JSON and fixture commands in merge notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4) #981

Background

Goal

Scope

Non-goals

Implementation notes

Success criteria

Real OpenChrome validation after implementation

Review checklist

Self-review clarifications (added before implementation)

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4) #981

Description

Background

Goal

Scope

Non-goals

Implementation notes

Success criteria

Real OpenChrome validation after implementation

Review checklist

Self-review clarifications (added before implementation)

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions