Skip to content

feat(core): token and compression metrics for high-volume read tools #990

@shaun0927

Description

@shaun0927

Summary

Add lightweight output-size and token-estimate metrics to high-volume read/crawl tools so OpenChrome can prove and regress-test token optimization claims at the tool-response level.

Why

OpenChrome already claims strong token compression through DOM mode, snapshot delta, adaptive screenshot, and compact summaries. However, most tool responses do not expose consistent per-call metrics such as raw chars, returned chars, estimated tokens, truncation, compression mode, cache status, or filter reduction. Crawl4AI exposes token usage concepts in extraction flows; OpenChrome should expose deterministic local metrics for its own optimization layer.

This helps agents choose narrower tools and helps maintainers catch regressions after merge.

Non-goals

  • Do not call tokenizer services or LLM APIs.
  • Do not make exact provider-token guarantees.
  • Do not change default response content.
  • Do not add metrics to every tool in the first PR.

Scope

Initial tools:

  • read_page
  • crawl
  • crawl_sitemap
  • validate_page

extract_data is deliberately deferred unless its current JSON response path can expose metrics without changing legacy output. This keeps the first PR focused on high-volume page/content reads.

Add a shared helper:

estimateTokens(text: string): number

Use a documented approximation such as Math.ceil(chars / 4) for English/ASCII-heavy content and label it estimated_tokens, not exact tokens.

Proposed response metadata

Where response shape allows JSON metadata, add:

{
  "metrics": {
    "raw_chars": 180000,
    "returned_chars": 12000,
    "estimated_tokens": 3000,
    "raw_estimated_tokens": 45000,
    "compression_ratio": 15.0,
    "truncated": false,
    "mode": "dom",
    "filter": "prune",
    "cache_status": "hit"
  }
}

For plain text outputs, append a compact machine-readable footer only when an explicit option is set:

{ "include_metrics": true }

Default behavior should avoid surprising legacy clients unless the tool already returns JSON.

Acceptance criteria

  • Shared metrics helper is unit-tested for empty strings, ASCII text, CJK text, and large strings.
  • read_page(mode: "dom") reports returned_chars, estimated_tokens, mode, and truncated when include_metrics: true or when structuredContent is available via feat(core): outputSchema + structuredContent for Tier-1 read tools (mcp-servers adoption 5) #871.
  • crawl and crawl_sitemap report page-level and summary-level metrics for output chars and truncation when include_metrics: true.
  • If feat(core): fit_markdown content filters for markdown read/crawl outputs #980 fit_markdown has landed, filter metrics are included and consistent with fit_markdown.length and raw_markdown.length.
  • Metrics are never computed from unsanitized secret-bearing intermediate strings when sanitizer would remove content; returned metrics describe returned/sanitized payloads.
  • Existing clients that read only content[0].text see byte-identical output unless include_metrics: true is passed.
  • Benchmark or test fixture asserts DOM mode compression ratio does not regress below a documented threshold for at least one committed fixture.
  • npm run build && npm test && npm run lint && npm run lint:tier pass.

OpenChrome real verification after merge

  1. Navigate to a medium page:
    navigate({ "url": "https://example.com" }) -> tabId
  2. Read DOM with metrics:
    read_page({ "tabId": tabId, "mode": "dom", "include_metrics": true })
    Pass if metrics include returned_chars > 0, estimated_tokens > 0, mode: "dom", and truncated: false.
  3. Compare AX vs DOM:
    read_page({ "tabId": tabId, "mode": "ax", "include_metrics": true })
    read_page({ "tabId": tabId, "mode": "dom", "include_metrics": true })
    Pass if both include metrics. Compression ratio comparisons are gated only by committed fixture tests, not this live smoke, because public pages can change.
  4. Crawl metrics:
    crawl({ "url": "https://example.com", "max_pages": 1, "output_format": "text", "include_metrics": true })
    Pass if summary includes total returned chars/tokens and page includes per-page metrics.
  5. Regression check:
    Run a legacy read_page without include_metrics. Pass if content remains byte-identical to the pre-change fixture snapshot.

Self-review tightening

  • Exact tokenizer compatibility is out of scope; the field is named estimated_tokens to avoid overclaiming.
  • extract_data was deferred from the initial scope unless it can be made strictly additive.
  • Legacy byte parity is mandatory when include_metrics is omitted.
  • Any strict compression-ratio gate must use committed fixtures, not public live pages.

Fit with OpenChrome direction

This directly supports OpenChrome's token-efficiency positioning and gives future optimization PRs measurable acceptance gates. It is low-risk because it is mostly additive and opt-in for legacy text responses.

Related

Curated scope, overlap handling, and verification checklist

Scope classification

  • Canonical lane: token/compression observability metrics.
  • Primary deliverable: lightweight per-call output-size and token-estimate metrics for high-volume read/crawl tools.
  • Open PR: feat(core): token and compression metrics for high-volume read tools (#990) #1077 (feat/990-token-metrics). Continue there; do not create duplicate implementation work.
  • Non-goal: calling tokenizer services/LLM APIs, exact provider-token guarantees, changing default response content, or instrumenting every tool in v1.

Overlap and conflict resolution

Implementation checklist

  • Add deterministic metrics fields such as raw chars, returned chars, estimated tokens, truncation, compression mode, cache status, and filter reduction for initial high-volume tools like read_page/crawl.
  • Use local approximate token estimates only; No network/model tokenizer calls.
  • Expose metrics in a compact metadata field or optional verbose mode without changing primary content.
  • Add tests for metrics presence, estimate determinism, truncation reporting, cache/compression labels, and default-content compatibility.
  • Document how maintainers should use metrics to detect token regressions.

Success criteria

  • High-volume read tool responses include consistent local size/token metrics when enabled or in metadata.
  • Metrics are deterministic enough for regression tests but do not claim provider-exact billing.
  • Default response content remains compatible.
  • Maintainers can compare compression modes before/after changes.

Post-merge OpenChrome live verification checklist

  • Run read_page/crawl on a local large fixture and verify metrics include raw/returned chars, estimated tokens, truncation, and compression mode.
  • Run a cached/repeated call if supported and verify cache status is reported.
  • Compare a compact vs full mode response and record metric deltas.
  • Include sanitized tool response metadata and fixture path in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestobservabilityObservabilityperformancePerformance, latency, throughput, or resource-use improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions