Skip to content

feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4) #981

@shaun0927

Description

@shaun0927

Background

browser-use exposes token/cost tracking for LLM calls. OpenChrome usually cannot observe the MCP host's LLM token accounting, but it can measure the payloads it emits and the browser/CDP work required to produce them. That is enough to prevent regressions in speed and context size.

Related existing work: #846 (performance insights), #869/#870 (notifications/progress/logging), #897 (byte-aware console buffer). This issue is narrower: payload budget telemetry for browser tool responses.

Goal

Add lightweight payload and latency telemetry to high-frequency tools so optimization claims can be verified after merge.

Scope

For at least read_page, inspect, query_dom, extract_data, crawl, and screenshot-producing tools, expose debug/metadata fields or logs for:

  • outputChars
  • estimatedOutputTokens using a documented heuristic such as ceil(chars / 4)
  • toolLatencyMs
  • browserLatencyMs or CDP/evaluate timing when available
  • domNodesVisited
  • domNodesEmitted
  • compressionRatio where applicable
  • deltaSavedChars where snapshot delta applies
  • screenshotBytes / screenshotFormat where applicable
  • artifactBytes if output handles are used

Required behavior:

  • Keep telemetry compact by default.
  • Expose detailed timing only with debug/diagnostic mode or through logs/metrics.
  • Do not include sensitive page content in metrics.
  • Add a small benchmark or test fixture that fails if compact output regresses dramatically without an explicit snapshot update.

Non-goals

  • Do not claim exact LLM billing unless the MCP host provides usage data.
  • Do not fetch external pricing data in core server code.
  • Do not add a heavy observability dependency.

Implementation notes

Success criteria

  • Tool responses or logs expose enough data to compare compact vs normal output size.
  • Tests assert metrics are present for at least two high-frequency tools.
  • No sensitive text is copied into telemetry fields.
  • Documentation explains that estimatedOutputTokens is an approximation, not provider billing.

Real OpenChrome validation after implementation

Use real Chrome through OpenChrome.

  1. Start OpenChrome:
    npm run build
    node dist/cli/index.js serve
  2. In an MCP client:
    • navigate to https://github.com/browser-use/browser-use
    • run read_page in normal/full mode
    • run read_page in compact mode if available
    • run inspect with a focused query
    • run extract_data for repo metadata
  3. Capture evidence:
    • outputChars and estimatedOutputTokens for each call
    • toolLatencyMs
    • domNodesVisited and domNodesEmitted for DOM tools
    • screenshot bytes if screenshot is part of validation
  4. Pass condition:
    • metrics are present and internally consistent (estimatedOutputTokens ~= ceil(outputChars / 4)).
    • compact/focused tools show lower output size than full read on the same page.
    • metrics contain no obvious raw page secrets from a local fixture with password/token fields.

Review checklist

  • Measures payload, not speculative provider billing.
  • Works without network pricing lookup.
  • Does not add sensitive content to logs.
  • Enables future benchmark gates.

Self-review clarifications (added before implementation)

  • Metrics should live under a structured metrics object where the response format permits it; avoid prose-only metrics.
  • estimatedOutputTokens must be computed from the serialized response body that the MCP host receives, not from internal pre-redaction content.
  • When a metric is not applicable, omit it or set it to null; do not use misleading zero values.
  • The first PR should cover read_page plus one of inspect/extract_data; screenshots/crawl can follow if needed.
  • Any benchmark threshold must be fixture-based and updateable through an intentional snapshot process, not dependent on a changing public website.

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: browser output telemetry.
  • Primary deliverable: payload and latency telemetry for high-frequency browser tool responses.
  • Open PR: feat(inspect): add opt-in output token metrics (#981) #1100 (feat/981-inspect-metrics). Continue there; do not duplicate the PR.
  • Non-goal: MCP host LLM cost accounting, exact tokenizer billing, changing default content, or replacing notification/progress systems.

Overlap and conflict resolution

Implementation checklist

  • Expose fields such as outputChars, estimatedOutputTokens, latencyMs, screenshot bytes/path, truncation, compression, cache status, and warning thresholds for target tools.
  • Instrument at least read_page, inspect, query_dom, extract_data, crawl, and screenshot-producing tools as scoped or document staged coverage.
  • Use local estimates only and avoid external tokenizer/API calls.
  • Add tests for metric presence, bounds, latency field shape, disabled/default behavior, and no payload-content regression.
  • Document telemetry interpretation and regression workflow.

Success criteria

  • Maintainers can verify payload/latency regression claims after merge.
  • Telemetry does not materially increase default response size.
  • Metrics are deterministic enough for tests where applicable.
  • No external token/cost service is required.

Post-merge OpenChrome live verification checklist

  • Run representative local calls for read_page/inspect/extract/crawl/screenshot and verify telemetry fields.
  • Compare enabled vs disabled/default response size.
  • Verify threshold warnings on a large fixture output.
  • Include sanitized telemetry JSON and fixture commands in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestobservabilityObservabilityperformancePerformance, latency, throughput, or resource-use improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions