feat(observability): measure response tokens and cache status#1063
feat(observability): measure response tokens and cache status#1063shaun0927 wants to merge 8 commits into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d20405b6f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const metrics = getMetricsCollector(); | ||
| const bytes = estimateResultPayloadBytes(result); | ||
| metrics.observe('openchrome_tool_output_bytes', withTenantLabel({ tool: toolName }), bytes); | ||
| metrics.observe('openchrome_tool_estimated_tokens', withTenantLabel({ tool: toolName }), estimateOutputTokensFromChars(bytes)); |
There was a problem hiding this comment.
Compute token estimate from character count
recordToolOutputObservability passes the UTF-8 byte length into estimateOutputTokensFromChars, but that heuristic is documented and named as chars/4. For non-ASCII responses (e.g., CJK or emoji-heavy outputs), byte length is larger than character count, so openchrome_tool_estimated_tokens is systematically inflated and no longer comparable across languages.
Useful? React with 👍 / 👎.
| const status = typeof obj.status === 'string' ? obj.status : typeof obj.cacheStatus === 'string' ? obj.cacheStatus : null; | ||
| if (!status) return null; | ||
| const keyVersion = obj.keyVersion ?? obj.version ?? 'unknown'; | ||
| return { status: status.toUpperCase(), keyVersion: String(keyVersion) }; |
There was a problem hiding this comment.
Normalize cache key version metric label
extractCacheStatus forwards keyVersion as an arbitrary string, and that value is later emitted as the key_version label in openchrome_cache_status_total without any bounding/normalization. If a tool reports per-request versions (hashes/timestamps), this introduces unbounded label cardinality and can degrade Prometheus memory/query performance.
Useful? React with 👍 / 👎.
d20405b to
a6abe02
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6abe02838
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (cache) { | ||
| metrics.inc('openchrome_cache_status_total', withTenantLabel({ | ||
| tool: toolName, | ||
| status: cache.status, |
There was a problem hiding this comment.
Constrain cache status labels before emitting metrics
recordToolOutputObservability emits openchrome_cache_status_total with status taken directly from tool output via extractCacheStatus (only uppercased), so any tool that includes request-specific detail in cache status (for example reason strings or embedded IDs) will create unbounded label cardinality and degrade Prometheus memory/query performance. Please normalize this to a small allowlist (e.g., HIT/MISS/BYPASS/ERROR) or map unknown values to a fixed bucket before incrementing the counter.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d8bb37afb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (raw && typeof raw === 'object') { | ||
| const obj = raw as Record<string, unknown>; | ||
| const status = typeof obj.status === 'string' ? obj.status : typeof obj.cacheStatus === 'string' ? obj.cacheStatus : null; | ||
| if (!status) return null; |
There was a problem hiding this comment.
Continue cache-status fallback instead of returning early
When raw is an object but does not contain a string status/cacheStatus, this early return null prevents the later structuredContent.cacheStatus fallback from running. In mixed-shape results (for example, cache: { keyVersion: "v2" } plus structuredContent.cacheStatus: "miss"), openchrome_cache_status_total is silently dropped even though a valid cache status is present.
Useful? React with 👍 / 👎.
Constraint: OpenChrome must expose token/output savings without adding tokenizer or billing dependencies. Rejected: Provider-specific tokenizers | they add dependency and billing semantics outside this MCP server boundary. Confidence: medium Scope-risk: narrow Directive: Keep metric labels bounded; never add URL, selector, instruction, or domain labels for these metrics. Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts Not-tested: Live /metrics smoke against a running OpenChrome instance
5d8bb37 to
94f4324
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94f4324d1b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof originalChars === 'number' && typeof compressedChars === 'number' && originalChars > compressedChars) { | ||
| metrics.observe( | ||
| 'openchrome_tool_compression_saved_bytes', | ||
| withTenantLabel({ tool: toolName, mode: level }), |
There was a problem hiding this comment.
Normalize compression mode labels before emitting metrics
recordToolOutputObservability forwards _compression.level directly into the mode label for openchrome_tool_compression_saved_bytes. Because MCPResult is free-form, any tool can return request-specific strings in _compression.level, which creates unbounded label cardinality and can degrade Prometheus memory/query performance once compression stats are emitted. Please map this label to a small allowlist (and bucket unknowns) before calling observe.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94f4324d1b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ], | ||
| isError: true, | ||
| }; | ||
| this.recordToolOutputObservability(toolName, forbiddenResult); |
There was a problem hiding this comment.
Bound tool labels before recording denied-call output metrics
recordToolOutputObservability is called on the scope-denied path before verifying that toolName is a registered tool, so a read-only API key can send arbitrary params.name values and create unbounded new time series in openchrome_tool_output_bytes / openchrome_tool_estimated_tokens (and potentially cache-status metrics). This introduces a cardinality-amplification vector from user input and can degrade Prometheus memory/query performance; normalize unknown names to a fixed bucket (e.g. unknown) or only emit these metrics after successful tool lookup.
Useful? React with 👍 / 👎.
Merges the latest develop test-infra repairs so PR #1063 is evaluated against the branch it targets without changing the bounded metrics feature itself. Constraint: Remote CI failed on stale develop test expectations unrelated to token observability. Rejected: Editing unrelated doctor/tab tests in this PR | the fixes already landed on develop and should be consumed as base updates. Confidence: medium Scope-risk: narrow Directive: Keep output observability labels bounded; do not add URL, selector, instruction, or domain labels. Tested: pending rerun after dependency install; remote CI required for full matrix. Not-tested: Full CI matrix locally. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5f9adab7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const originalChars = (compression as Record<string, unknown>).originalChars; | ||
| const compressedChars = (compression as Record<string, unknown>).compressedChars; | ||
| const level = String((compression as Record<string, unknown>).level ?? 'unknown'); | ||
| if (typeof originalChars === 'number' && typeof compressedChars === 'number' && originalChars > compressedChars) { |
There was a problem hiding this comment.
Preserve compression stats before observing saved bytes
recordToolOutputObservability expects _compression.originalChars and _compression.compressedChars, but the success path currently rewrites _compression to { level, verbosity } before this method runs when compression.trackSavings is enabled. In that configuration, any tool-provided compression stats are discarded, so openchrome_tool_compression_saved_bytes never records even when a tool returns valid savings data. Please merge/retain existing _compression fields instead of replacing the object.
Useful? React with 👍 / 👎.
Records original and delta response sizes on real read_page DOM delta responses so compression savings metrics are populated from actual tool output instead of inert placeholder metadata. Constraint: PR #1063 promised compression savings observability without adding tokenizer or billing dependencies. Rejected: Provider-specific token accounting | this remains a bounded payload-size heuristic within the MCP server boundary. Confidence: high Scope-risk: narrow Directive: Keep output metric labels bounded and never include URL, selector, instruction, or domain labels. Tested: npm test -- --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts; ./node_modules/.bin/tsc -p tsconfig.json --pretty false Not-tested: Live /metrics smoke against a running OpenChrome instance. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d45a2def9a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| originalChars: outputText.length, | ||
| compressedChars: compressedText.length, |
There was a problem hiding this comment.
Count full uncompressed response when reporting delta savings
_compression.originalChars is set from outputText.length, but the non-delta DOM response for the same call is outputText + domPaginationSection. When includePagination is left at its default (true) and pagination metadata is non-empty, this undercounts the baseline and can make originalChars <= compressedChars, so openchrome_tool_compression_saved_bytes is underreported or dropped even when delta compression actually saved bytes.
Useful? React with 👍 / 👎.
Cleans up module mocks used by the read_page compression-savings regression test so full-suite CI workers do not inherit mocked session or DOM modules. Constraint: CI runs many Jest files in shared workers, so test-local doMock calls must not leak beyond this suite. Rejected: Removing the real read_page delta-path test | it is needed to prove the compression-savings metric is populated. Confidence: high Scope-risk: narrow Directive: Keep observability tests hermetic when mocking runtime modules. Tested: npm test -- --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts Not-tested: Full Jest suite locally after this test-isolation fix. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53347f80b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| metrics.observe( | ||
| 'openchrome_tool_compression_saved_bytes', | ||
| withTenantLabel({ tool: toolName, mode: level }), | ||
| originalChars - compressedChars, |
There was a problem hiding this comment.
Convert compression savings metric to true byte deltas
openchrome_tool_compression_saved_bytes is recorded from originalChars - compressedChars, but those _compression fields are character counts (for example read_page sets them from .length). For non-ASCII output (CJK, emoji, etc.), character deltas can differ substantially from UTF-8 byte deltas, so this metric reports incorrect units and can mislead dashboards/alerts that depend on byte savings. Compute the delta from UTF-8 byte lengths (or rename the metric to chars) before observing it.
Useful? React with 👍 / 👎.
Constraint: PR #1063 inherits the expanded Tier 1 tool surface from develop. Rejected: Loosening the cursor verification assertion | it would stop catching accidental tool-surface drift. Confidence: high Scope-risk: narrow Directive: Keep the exact count aligned whenever Tier 1 tool registration intentionally changes. Tested: CI=true npx jest --config jest.ci.config.js --runInBand tests/cross-env/cursor-verification.test.ts Not-tested: Full CI matrix pending on GitHub runners. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
@codex review |
Constraint: PR #1063 must pass Windows CI where Git can check out JSON fixtures with CRLF line endings. Rejected: Changing the v1.11.0 fixture payload | the response shape did not change. Confidence: high Scope-risk: narrow Directive: Preserve strict JSON shape assertions while normalizing platform line endings at comparison time. Tested: CI=true npx jest --config jest.ci.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.ts Not-tested: Full CI matrix pending on GitHub runners. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
@codex review |
Summary
Closes #996.
Alignment / scope review
This stays aligned with OpenChrome's direction: no server-side LLM, no tokenizer dependency, no billing integration, and no persisted response bodies. It measures only payload sizes/status after response shaping and redaction.
Verification
/Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty falsenpx jest --config jest.config.js --runInBand tests/metrics/tool-output-observability.test.ts tests/metrics/cookie-scan-metrics.test.tsPost-merge smoke
Run the issue's OpenChrome MCP verification script/manual flow against
/metricsto confirmopenchrome_tool_output_bytes,openchrome_tool_estimated_tokens, and cache-status counters appear with bounded labels.