feat(inspect): add opt-in output token metrics (#981) by shaun0927 · Pull Request #1100 · shaun0927/openchrome

shaun0927 · 2026-05-12T15:26:12Z

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field	Value
Branch	`feat/981-inspect-metrics` → `feat/990-token-metrics`
Draft	no
CI	—
Mergeable	❌ CONFLICTING
Review decision	—
Codex (latest)	—
Other reviewers (latest)	—
Head	`33d0204` — Expose focused inspect metrics without changing defaults
Commits	2

_{Owner comment cleanup: 0 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.}

Summary

Extends the open feat(core): token and compression metrics for high-volume read tools (#990) #1077 include_metrics pattern to inspect, the focused high-frequency page-state tool called out by feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4) #981.
Keeps inspect default output byte-for-byte free of metrics unless callers pass include_metrics: true.
Reuses the provider-neutral ceil(chars / 4) helper from feat(core): token and compression metrics for high-volume read tools (#990) #1077 and labels the result as estimated tokens, not billing.

Refs #981. Stacked on #1077; this PR should merge after #1077.

Direction / non-overlap review

Fits OpenChrome's direction because it makes focused/faster tool output measurable without adding an LLM, tokenizer dependency, pricing lookup, or persistent page-content logging.
Avoids duplicating open PRs:
- feat(observability): measure response tokens and cache status #1063 measures final responses through the observability collector, not opt-in tool response text.
- feat(core): token and compression metrics for high-volume read tools (#990) #1077 already covers read_page, crawl, crawl_sitemap, and validate_page; this PR only adds inspect on top of that branch.
- It does not change compact response shape (feat(perf): cache-friendly compact responses for high-frequency state tools (browser-use adoption G1) #973), output handles (feat(core): 2-stage fetch with output handles for large-output tools (closes #887) #938), state headers (feat(core): unified state header on page-state tool responses (#893) #912), or actionable DOM annotation (feat(dom): enhanced actionable-element annotations and safe occlusion filtering (browser-use adoption G3) #977/feat(read_page): surface custom interactive controls #1071).
Scope is intentionally narrow: no default behavior change, no new metric schema, no server-wide response mutation.

Verification

npm test -- --runInBand tests/tools/inspect-metrics.test.ts
npm run build
npm run lint:changed
git diff --check

Post-merge OpenChrome validation checklist

After #1077 and this PR merge:

Build and start OpenChrome:

npm run build
node dist/cli/index.js serve

In an MCP client connected to a real Chrome session:
- navigate to https://github.com/browser-use/browser-use
- run inspect with { "query": "repository title, stars, latest visible release link" }
- run the same inspect call with { "include_metrics": true }
Pass criteria:
- default inspect output has no [openchrome_metrics] footer.
- metrics-enabled output includes a single [openchrome_metrics] JSON footer.
- returned_chars equals the body length before the footer.
- estimated_tokens === Math.ceil(returned_chars / 4).
- no raw secret/password fixture text appears inside the metrics JSON.

Adds provider-neutral estimated token and output-size metrics for crawl, crawl_sitemap, read_page, and validate_page so optimization claims can be verified without changing legacy responses by default.\n\nConstraint: Metrics must be opt-in for text surfaces and approximate rather than provider-specific.\nRejected: Exact tokenizer integration | would add dependency and overstate provider compatibility.\nConfidence: high\nScope-risk: narrow\nDirective: Keep strict compression-ratio gates on committed fixtures, not public live pages.\nTested: npm run build -- --pretty false; npm test -- --runInBand tests/core/metrics/token-estimate.test.ts tests/core/tools/crawl.engine.test.ts; npm run lint:changed\nNot-tested: Full npm test suite; live OpenChrome smoke.

Constraint: #1077 already covers read_page/crawl/validate metrics, so this stacked slice only adds inspect support on top of that open metrics branch.\nRejected: Reimplementing #981 read_page response telemetry on develop | would duplicate open PRs #1063 and #1077.\nConfidence: high\nScope-risk: narrow\nDirective: Keep include_metrics opt-in and provider-neutral; do not imply exact LLM billing.\nTested: npm test -- --runInBand tests/tools/inspect-metrics.test.ts; npm run build; npm run lint:changed; git diff --check\nNot-tested: Live OpenChrome MCP smoke; dependency npm audit still reports existing 6 vulnerabilities.

gemini-code-assist · 2026-05-12T15:26:17Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

qodo-code-review · 2026-05-12T15:26:18Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0093361c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T10:34:07Z

+      } catch {
+        // DOM serialization failed — fall through to AX mode as fallback
      }


Return DOM-mode errors instead of silently falling back

This catch swallows DOM serialization failures and then falls through to the AX code path, so a request with mode: "dom" can return AX output instead of an error. That changes the contract for explicit DOM callers (they can no longer distinguish DOM failure from successful DOM output) and can break downstream parsers that rely on DOM-mode formatting.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:34:07Z

+  if (config.security?.sanitize_content === false || result.isError || !result.content) {
    return result;


Honor include_metrics when sanitization is disabled

With this early return, include_metrics: true is ignored whenever sanitize_content is disabled, because metrics are appended only later in the sanitization branch. In environments that run with --no-sanitize-content, callers now get no metrics despite explicitly requesting them.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:34:07Z

-  const maxDepth = filter === 'interactive'
-    ? Math.min(requestedDepth ?? defaultDepth, defaultDepth)
-    : requestedDepth ?? defaultDepth;
+  const maxDepth = (args.depth as number) || defaultDepth;


Preserve interactive depth cap in AX tree requests

Using (args.depth as number) || defaultDepth removes the prior interactive cap and also treats depth: 0 as unset. For filter: "interactive", callers can now request arbitrarily deep AX trees (e.g., depth 10+) instead of being bounded to the documented interactive default cap, which increases output size and runtime risk.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:34:07Z

-            }
-          : { summary, pages: minimalPages, note: 'Content omitted due to size constraints' };
-        outputJson = JSON.stringify(minimalOutput, null, 2);
+        outputJson = JSON.stringify({ summary, pages: minimalPages, note: 'Content omitted due to size constraints' }, null, 2);


Keep include_metrics data in crawl size-fallback output

When crawl output exceeds MAX_OUTPUT_CHARS twice, this branch emits a minimal payload that omits all metrics even if include_metrics was requested. That makes include_metrics behavior inconsistent specifically on large crawls, where size/token estimates are most valuable and expected by clients.

Useful? React with 👍 / 👎.

…lve)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5ee0324c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T10:41:42Z

 async function fetchPage(
  sessionId: string,
  url: string,


Restore fetchOnePage exports consumed by crawl runner

This change removes the exported fetchOnePage, FetchOnePageOptions, and FetchOnePageResult API from src/tools/crawl.ts, but src/core/crawl/runner.ts still imports those symbols (fetchOnePage as defaultFetchOnePage plus the two types). In this commit state, TypeScript compilation fails with missing-export errors, which blocks build/deploy for crawl job tooling.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:41:42Z

+          if (includeMetrics) {
+            cleaned['_metrics'] = buildTextMetrics(JSON.stringify(cleaned), { mode: modeForMetrics });
+          }
+          return { ...block, text: JSON.stringify(cleaned) };


Compute semantic metrics from the final serialized JSON

When mode: "semantic" and include_metrics: true, metrics are calculated from JSON.stringify(cleaned) before _metrics is inserted, then the payload is stringified again with _metrics present. This guarantees returned_chars/estimated_tokens can be lower than the emitted text length, so callers receive inconsistent metrics specifically for semantic responses.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:41:42Z

    return {
      content: [{ type: 'text', text: pageStatsLine + output + axPaginationSection }],


Preserve AX refs map in read_page responses

The AX-mode return path now emits only content and drops the refs object entirely, even though the AX walk still mints ref IDs and downstream consumers/tests rely on result.refs metadata (role/name/TTL) to resolve stable element references. This is a response-shape regression for read_page(mode='ax') callers that expect the structured refs payload.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T10:41:42Z

+    const output = lines.join('\n');
    const includePaginationAx = args.includePagination !== false;
    const axPaginationSection = includePaginationAx ? formatPaginationSection(await detectPagination(page, tabId)) : '';


Honor AX delta compression requests

This path now always joins and returns the full AX snapshot without checking args.compression === 'delta', so clients requesting AX delta mode receive full trees on every call. That regresses the existing compression contract and can significantly increase output size and latency on repeated reads of large pages.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 961c01b563

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:02:01Z

 export function registerReadPageTool(server: MCPServer): void {
  server.registerTool('read_page', sanitizedHandler, definition);
 }


Re-export readPageHandlerForReuse

This file no longer exports readPageHandlerForReuse, but src/tools/_shared/return-after-state.ts still imports and calls that symbol (import at line 16, call at line 96). In this commit state, TypeScript compilation fails with a missing-export error, which blocks builds and breaks every tool path that includes return-after-state chaining.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:02:01Z

+    if (mode !== 'ax' && mode !== 'dom' && mode !== 'css' && mode !== 'semantic') {
      return {
-        content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", "semantic", or "markdown".` }],
+        content: [{ type: 'text', text: `Error: Invalid mode "${mode}". Must be "ax", "dom", "css", or "semantic".` }],
        isError: true,


Keep markdown mode accepted in read_page

The mode gate now rejects mode: "markdown", so existing callers receive an error instead of markdown output. This is a behavior regression for an already-supported mode (including onlyMainContent/includeLinks options), and it directly conflicts with current markdown contract coverage in tests/tools/read-page.test (Markdown Mode section), so integrations relying on markdown extraction will break.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:02:01Z

+              return {
+                content: [{ type: 'text', text: statsLine + delta.content + domPaginationSection }],
+              };


Restore _compression metadata on DOM delta responses

When compression === "delta" and a delta is returned, this branch now emits only content and drops _compression fields (level, originalChars, compressedChars). Downstream observability and consumers that distinguish delta/full outputs from _compression lose that signal, and the existing read_page compression-savings expectations (see tests/metrics/tool-output-observability.test) no longer hold.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3522a9148

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:12:27Z

      const batchResults = await Promise.all(
        batch.map((item) =>
          limiter(async () => {
-            const runFetch = async () => {
            // Check robots.txt before fetching


Restore adaptive dispatcher handling in crawl

This batch execution path now always uses the fixed limiter and never routes through the adaptive dispatcher, so dispatcher: "adaptive" / dispatcher_options requests are effectively ignored. That regresses crawl behavior on rate-limited or memory-pressured targets (the exact cases adaptive mode was designed for in src/core/crawl/dispatcher.ts) and removes the expected adaptive throttling semantics for existing callers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:12:27Z

-    const output = outputLines.join('\n');
-    const outputCharCount = output.length;
-    diagnostics.formatMs = mark() - formatStart;
+    const output = lines.join('\n');


Honor compact AX output requests

AX responses are now always built from the full lines array, so compact: true no longer changes output. Callers that rely on compact mode to suppress non-actionable leaves now receive much larger trees, which increases truncation risk and latency on complex pages.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:12:27Z

+    return {
      content: [{ type: 'text', text: pageStatsLine + output + axPaginationSection }],
-      refs: refsMap,
-    });
+    };


Keep read_page diagnostics output when requested

The handler now returns plain content here without attaching any _diagnostics, and the function no longer tracks timing fields at all, so diagnostics: true callers cannot get mode/timing metadata. This is a regression for tooling that depends on those fields to debug DOM vs AX fallback behavior and sanitization overhead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73adb34f96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T12:45:25Z

+    // Seed the BFS queue with the start URL
    const normalizedStart = normalizeUrl(url);
-    enqueueItems([{ url: normalizedStart, depth: 0 }]);
+    tracker.enqueue([{ url: normalizedStart, depth: 0 }]);


Restore best_first crawl strategy handling

This path hard-codes BFS queue seeding/processing and no longer honors strategy: "best_first", so callers that previously depended on URL scoring now get FIFO traversal instead. Under max_pages limits this can materially change which pages are returned (high-priority URLs may never be visited), which is a behavior regression for existing crawl integrations.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T12:45:25Z

+        const result = await serializeDOM(page, cdpClient, {
          maxDepth: depth ?? -1,
          filter: filter,
          interactiveOnly: filter === 'interactive',
-          planningProfile,
-        }));
-        diagnostics.formatMs = diagnostics.domGetDocumentMs;
+        });


Pass planningProfile through to DOM serialization

The DOM path now calls serializeDOM without forwarding args.planningProfile, which forces the serializer default profile even when callers request planningProfile: "stable". That silently disables the stable/noise-reduced output mode and can increase output churn and token usage for workflows that rely on deterministic stable snapshots.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T12:45:25Z

+        return {
+          content: [{ type: 'text', text: outputText + domPaginationSection }],
+        };


Reattach node_refs block to DOM responses

This return now emits only DOM text plus pagination, dropping the [node_refs] mapping that was previously attached from emitted backend node IDs. Clients using DOM-mode output to recover stable node references lose that mapping and can no longer resolve those refs from the same response.

Useful? React with 👍 / 👎.

shaun0927 · 2026-05-13T15:14:07Z

Deferred: the PR description specifies "this PR should merge after #1077". #1077 has fix commits pushed (370cf98 — markdown-clean restored) but is still blocked on develop CI signal (PR #1105 in flight). Once #1077 merges to develop, this PR can either be re-targeted to develop directly or merged forward into the consolidated feat/990-token-metrics base. The PR is content-clean (5 files, +168/-696 — mostly trimming, 0 Codex P0/P1/P2 outstanding).

# Conflicts: # src/tools/crawl.ts

shaun0927 · 2026-05-13T15:36:12Z

Merge rationale (stack consolidation)

Intent. Refs #981 — extends the open #1077 include_metrics pattern to inspect, the focused high-frequency page-state tool.

Why this is correct.

inspect default output stays byte-for-byte unchanged unless callers pass include_metrics: true — pinned by the v1.11 byte-parity test path.
Reuses the provider-neutral ceil(chars / 4) token estimator helper from feat(core): token and compression metrics for high-volume read tools (#990) #1077 — no new dependencies, no LLM calls.
Names the field estimated_tokens to avoid exact-token overclaiming.
The crawl-side conflict resolution preserves feat(core): token and compression metrics for high-volume read tools (#990) #1077's restored markdown-clean path; this PR only adds inspect's metrics block.
Scope contained: 5 files, +168/-696 — the deletions are large because the resolved branch absorbed feat(core): token and compression metrics for high-volume read tools (#990) #1077's markdown-clean restoration tree.

CI. Targets feat/990-token-metrics (the #1077 head); workflow only runs on main/develop PRs. CI will run on the consolidated stack root when #1077 lands on develop.

The merge of PR #1100 (inspect metrics) into feat/990-token-metrics dropped the readPageHandlerForReuse export from read-page.ts, which src/tools/_shared/return-after-state.ts (#845 plumbing on develop) imports. Restoring the export fixes the TS2305 build break that followed the develop merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shaun0927 added 2 commits May 13, 2026 00:05

shaun0927 force-pushed the feat/990-token-metrics branch 3 times, most recently from beb6b89 to 1225b20 Compare May 13, 2026 02:14

shaun0927 mentioned this pull request May 13, 2026

feat(obs): payload budget telemetry for browser tool outputs (browser-use adoption G4) #981

Closed

20 tasks

Merge feat/990-token-metrics into feat/981-inspect-metrics

a009336