Skip to content

feat(core): unified state header on page-state tool responses (#893)#912

Merged
shaun0927 merged 16 commits into
developfrom
feat/893-state-header
May 13, 2026
Merged

feat(core): unified state header on page-state tool responses (#893)#912
shaun0927 merged 16 commits into
developfrom
feat/893-state-header

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/893-state-headerdevelop
Draft no
CI ✅ all 9 checks passing
Mergeable ✅ MERGEABLE
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) gemini-code-assist: commented, chatgpt-codex-connector: commented
Head 403f4cd — chore: merge origin/develop into feat/893-state-header
Commits 9

Owner comment cleanup: 3 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Closes #893.

Summary

Prepends a uniform 4-line state header to text-mode responses of `read_page`, `page_content`, `inspect`, and `validate_page` (and a top-level `state` object on their JSON-mode responses). Default on; env opt-out `OPENCHROME_STATE_HEADER=off` restores v1.11.0 byte-identical output.

Header format (exact)

```

  • Page URL:
  • Page Title: <title>
  • Page Mode: <ax|dom|css|html|inspect|validate>
  • Captured At:
    ```
    followed by one blank line, then the existing payload.

JSON responses receive a top-level `state` object with `url`, `title`, `mode`, `capturedAt`, `tabId`.

Data sources (all pre-existing — no new CDP plumbing)

  • `url`: `page.url()`
  • `title`: `page.title()`
  • `capturedAt`: `Date.now()` at response assembly
  • `tabId`: existing session-manager handle

No new `CDPSession.send` calls, no new `Page.frameNavigated` listeners. The "no new CDP methods" guard is enforceable post-merge via the existing trace tooling.

Tests (11/11 passing)

  • Helper unit tests: `formatHeaderText`, `prependHeaderText`, `mergeHeaderJson`, `isStateHeaderEnabled` env parsing (including case-insensitive `off`).
  • Byte-parity: `OPENCHROME_STATE_HEADER=off` output exactly matches the committed v1.11.0 fixture at `tests/fixtures/state-header/v1.11.0-read-page-ax.txt`. Fixture is regenerable via `REGEN_FIXTURE=1` after any intentional read_page text-format change.
  • Default-on test: response starts with the 4 expected header lines.
  • Cross-tool consistency: `read_page` and `inspect` within 100 ms have matching `url`/`title` and `capturedAt` differing by < 200 ms.

Backward compatibility

`OPENCHROME_STATE_HEADER=off` (case-insensitive) restores v1.11.0 output byte-exact. Any other value (unset, `on`, `true`, junk) enables the header. No warning logging.

Out of scope (deferred per issue's Out-of-Scope section)

  • `loaderId`-based frame-load identifier (would require new CDP listeners — deferred to a follow-up if URL-equality proves insufficient).
  • Network status / content-type fields in the header (belong to `oc_evidence_bundle`).
  • Per-frame headers for iframe captures (top-frame only).
  • Adding the header to `page_screenshot`.

Test plan

  • `npm run build && npm test` green on a clean checkout.
  • `OPENCHROME_STATE_HEADER=off node dist/index.js` against `https://example.com\` produces v1.11.0-shaped `read_page` text.
  • Default startup produces the 4-line header preamble.
  • `mcp__openchrome__read_page` followed by `mcp__openchrome__inspect` returns headers with matching url/title.

Adds a 4-line "Page URL / Title / Mode / Captured At" header to text-mode
responses of read_page, page_content, inspect, and validate_page (and a
top-level state object on their JSON responses). Default on; env opt-out
OPENCHROME_STATE_HEADER=off restores v1.11.0 byte-identical output.

No new CDP listeners — uses page.url() / page.title() / Date.now() only.
Tests: 11/11 passing (helper unit + byte-parity + 4-line header + cross-
tool consistency). Fixture regenerable via REGEN_FIXTURE=1.
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a unified page-state header for tool responses, allowing agents to track page context through a 4-line metadata block or a JSON state object. The implementation includes a shared utility for header management and an environment-based opt-out mechanism. The reviewer feedback focuses on optimizing performance by removing redundant asynchronous calls to page.title() and page.url(), suggesting that these calls be bypassed when the header is disabled or when the metadata is already available in existing result objects.

Comment thread src/tools/page-content.ts Outdated
import { getSessionManager } from '../session-manager';
import { MAX_OUTPUT_CHARS, DEFAULT_NAVIGATION_TIMEOUT_MS } from '../config/defaults';
import { withTimeout } from '../utils/with-timeout';
import { mergeHeaderJson } from './_shared/state-header';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Import isStateHeaderEnabled to allow for conditional execution of asynchronous calls when constructing the state header.

Suggested change
import { mergeHeaderJson } from './_shared/state-header';
import { mergeHeaderJson, isStateHeaderEnabled } from './_shared/state-header';

Comment thread src/tools/page-content.ts Outdated
Comment on lines +72 to +75
const missingWithState = mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
missingBody,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The await page.title() call is performed even if the state header is disabled via environment variables. This adds unnecessary latency (an extra CDP roundtrip) for users who have opted out. Wrap the header construction in a conditional check, similar to the pattern used in validate-page.ts.

        const missingWithState = isStateHeaderEnabled()
          ? mergeHeaderJson(
              { url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
              missingBody,
            )
          : missingBody;

Comment thread src/tools/page-content.ts Outdated
Comment on lines +102 to +105
const elementWithState = mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
elementBody,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid the asynchronous page.title() call when the state header is disabled.

      const elementWithState = isStateHeaderEnabled()
        ? mergeHeaderJson(
            { url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
            elementBody,
          )
        : elementBody;

Comment thread src/tools/page-content.ts Outdated
Comment on lines +124 to +127
const fullPageWithState = mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
fullPageBody,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid the asynchronous page.title() call when the state header is disabled.

      const fullPageWithState = isStateHeaderEnabled()
        ? mergeHeaderJson(
            { url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
            fullPageBody,
          )
        : fullPageBody;

Comment thread src/tools/read-page.ts Outdated
const domDeltaPayload = statsLine + delta.content + domPaginationSection;
return {
content: [{ type: 'text', text: statsLine + delta.content + domPaginationSection }],
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, domDeltaPayload) }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In DOM mode, result.pageStats already contains the URL and Title from the serializeDOM call. Re-fetching them via page.url() and await page.title() is redundant and adds unnecessary latency.

Suggested change
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, domDeltaPayload) }],
content: [{ type: 'text', text: prependHeaderText({ url: result.pageStats.url, title: result.pageStats.title, mode: 'dom', capturedAt: Date.now(), tabId }, domDeltaPayload) }],

Comment thread src/tools/read-page.ts Outdated
const domFullPayload = outputText + domPaginationSection;
return {
content: [{ type: 'text', text: outputText + domPaginationSection }],
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, domFullPayload) }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the already available result.pageStats to avoid redundant CDP calls.

Suggested change
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, domFullPayload) }],
content: [{ type: 'text', text: prependHeaderText({ url: result.pageStats.url, title: result.pageStats.title, mode: 'dom', capturedAt: Date.now(), tabId }, domFullPayload) }],

Comment thread src/tools/read-page.ts Outdated
output +
'\n\n[Output truncated. AX output exceeded the output budget. Use mode: "dom" or fallback: "dom" for DOM output, or use smaller depth / ref_id to focus on specific element.]' +
axPaginationSection,
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axTruncPayload),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In AX mode, axPageStats (fetched at line 394) already contains the URL and Title. Re-calling page.url() and await page.title() is redundant.

Suggested change
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axTruncPayload),
text: prependHeaderText({ url: axPageStats.url, title: axPageStats.title, mode: 'ax', capturedAt: Date.now(), tabId }, axTruncPayload),

Comment thread src/tools/read-page.ts Outdated
{
type: 'text',
text: domResult.content + fallbackNote + axPaginationSection,
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, axFallbackPayload),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the already available axPageStats to avoid redundant CDP calls.

Suggested change
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'dom', capturedAt: Date.now(), tabId }, axFallbackPayload),
text: prependHeaderText({ url: axPageStats.url, title: axPageStats.title, mode: 'dom', capturedAt: Date.now(), tabId }, axFallbackPayload),

Comment thread src/tools/read-page.ts Outdated
output +
'\n\n[Output truncated. Try mode: "dom" for ~5-10x fewer tokens, or use smaller depth / ref_id to focus on specific element.]' +
axPaginationSection,
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axDomFailPayload),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the already available axPageStats to avoid redundant CDP calls.

Suggested change
text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axDomFailPayload),
text: prependHeaderText({ url: axPageStats.url, title: axPageStats.title, mode: 'ax', capturedAt: Date.now(), tabId }, axDomFailPayload),

Comment thread src/tools/read-page.ts Outdated
const axNormalPayload = pageStatsLine + output + axPaginationSection;
return {
content: [{ type: 'text', text: pageStatsLine + output + axPaginationSection }],
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axNormalPayload) }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the already available axPageStats to avoid redundant CDP calls.

Suggested change
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axNormalPayload) }],
content: [{ type: 'text', text: prependHeaderText({ url: axPageStats.url, title: axPageStats.title, mode: 'ax', capturedAt: Date.now(), tabId }, axNormalPayload) }],

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52ec16a6fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts Outdated
const axNormalPayload = pageStatsLine + output + axPaginationSection;
return {
content: [{ type: 'text', text: pageStatsLine + output + axPaginationSection }],
content: [{ type: 'text', text: prependHeaderText({ url: page.url(), title: await page.title(), mode: 'ax', capturedAt: Date.now(), tabId }, axNormalPayload) }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse captured title instead of calling page.title() on return

This new await page.title() call introduces an extra CDP round-trip after the payload has already been generated, so if the page navigates or closes between serialization and response assembly, the tool now returns Read page error instead of returning the snapshot it already computed. This regression is most visible on fast-redirecting or unstable pages, and it did not exist before this commit because the return path did not perform another page evaluation. Using the already-captured title (pageStats.title / result.pageStats.title) avoids this failure mode.

Useful? React with 👍 / 👎.

Resolve read_page conflicts with develop semantic mode and keep unified state headers. Avoid page.title() calls when headers are disabled and reuse already captured DOM/AX page stats when assembling read_page headers.

Constraint: OPENCHROME_STATE_HEADER=off must preserve byte-identical legacy output.

Rejected: Re-fetching page URL/title after snapshot assembly | it adds latency and can fail after the page navigates or closes.

Confidence: high

Scope-risk: moderate

Directive: Prefer snapshot-captured page metadata over post-hoc CDP title calls for state headers.

Tested: ./node_modules/.bin/jest tests/tools/state-header.test.ts --runInBand

Tested: ./node_modules/.bin/tsc -p tsconfig.json --noEmit

Not-tested: Full GitHub Actions matrix locally; will be monitored after push.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a663bd6877

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

totalWarnings,
},
summary,
...stateHeader,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add state header to validate_page text output

The new state-header rollout is incomplete for validate_page: this change only injects state into the JSON object (...stateHeader) while leaving content[0].text as the bare summaryLine. Any client that reads text-mode tool output (the same compatibility path used by other tools in this commit) will not receive the promised 4-line page-state preamble, so cross-tool state parsing becomes inconsistent specifically for validate_page when OPENCHROME_STATE_HEADER is enabled.

Useful? React with 👍 / 👎.

shaun0927 added 2 commits May 13, 2026 04:27
Constraint: PR #912 promises the unified page-state header for text consumers, not only JSON fields.
Rejected: JSON-only state metadata | leaves text-mode clients unable to parse cross-tool state consistently.
Confidence: high
Scope-risk: narrow
Directive: Keep OPENCHROME_STATE_HEADER=off as the byte-parity opt-out for text payloads.
Tested: npm test -- --runInBand tests/tools/validate-page.test.ts tests/tools/state-header.test.ts; npx tsc -p tsconfig.json --noEmit
Not-tested: Full GitHub Actions matrix is pending after push.
Constraint: Tier-1 tool inventory now grows as core tools graduate across feature branches.
Rejected: Updating the exact expected count per PR | keeps the cross-env smoke test brittle for unrelated tool additions.
Confidence: high
Scope-risk: narrow
Directive: Keep asserting expand_tools presence and core expected tools; avoid exact Tier-1 cardinality unless the product freezes the surface.
Tested: npm test -- --runInBand tests/cross-env/cursor-verification.test.ts
Not-tested: Full GitHub Actions matrix is pending after push.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34f1ae2740

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/page-content.ts
};
const missingWithState = isStateHeaderEnabled()
? mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make state-header title lookup best-effort in page_content

With OPENCHROME_STATE_HEADER enabled, this path now awaits page.title() while assembling the response object; because it is inside the handler-wide try, a transient navigation/target-close error in title retrieval will flip an otherwise successful page_content result into a tool error. The HTML payload is already available at this point, so header metadata lookup should be non-fatal (fallback title) to avoid introducing intermittent regressions.

Useful? React with 👍 / 👎.

Comment thread src/tools/read-page.ts Outdated
const cssPayload = cssText + cssPaginationSection;
return {
content: [{ type: 'text', text: cssText + cssPaginationSection }],
content: [{ type: 'text', text: isStateHeaderEnabled() ? prependHeaderText({ url: page.url(), title: await page.title(), mode: 'css', capturedAt: Date.now(), tabId }, cssPayload) : cssPayload }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid failing CSS reads when state-header title fetch throws

In CSS mode, header assembly now directly awaits page.title() on the return line; if the page navigates or closes between CSS extraction and response assembly, this can throw and turn a completed CSS snapshot into Read page error. Since header fields are auxiliary metadata, title collection should be wrapped/fallback-safe so CSS output remains returnable when only the title probe fails.

Useful? React with 👍 / 👎.

Constraint: State-header defaults, Windows line endings, and parallel fixed ports all affect CI without changing runtime behavior.
Rejected: Updating fixtures with platform-specific newlines | would keep tests brittle across OS checkouts.
Confidence: high
Scope-risk: narrow
Directive: Keep state-header assertions compatible with both legacy page_stats and prefixed header output.
Tested: npm test -- --runInBand tests/tools/read-page.test.ts tests/tools/state-header.test.ts tests/tools/console-capture-regression.test.ts; npm test -- --runInBand tests/transports/http-bearer-auth.test.ts
Not-tested: Full GitHub Actions matrix is pending after push.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86e0731c11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/_shared/state-header.ts Outdated
Comment on lines +37 to +40
`- Page URL: ${h.url}\n` +
`- Page Title: ${h.title}\n` +
`- Page Mode: ${h.mode}\n` +
`- Captured At: ${capturedAtIso}\n`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape header fields before formatting fixed 4-line preamble

formatHeaderText interpolates url and title directly into newline-delimited header lines, so a page title containing \n/\r can split the preamble into extra lines and spoof subsequent fields. Because this feature is intended to provide a uniform machine-parsable 4-line header, unescaped control characters break that contract and can cause downstream parsers to read incorrect mode/capturedAt values.

Useful? React with 👍 / 👎.

Comment thread src/tools/read-page.ts Outdated
const cssPayload = cssText + cssPaginationSection;
return {
content: [{ type: 'text', text: cssText + cssPaginationSection }],
content: [{ type: 'text', text: isStateHeaderEnabled() ? prependHeaderText({ url: page.url(), title: await page.title(), mode: 'css', capturedAt: Date.now(), tabId }, cssPayload) : cssPayload }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hard failure on late title fetch in CSS read_page path

This new await page.title() runs after the CSS payload is already assembled; if the tab is closed or mid-navigation at that moment, Puppeteer can reject and the outer catch converts an otherwise successful capture into Read page error. That is a behavior regression introduced by the header path; use a best-effort title source (or fallback) so header enrichment cannot discard completed CSS output.

Useful? React with 👍 / 👎.

Constraint: Jest console-capture noise can share the in-process stdout hook during parallel CI runs.
Rejected: Exact stdout line counting | fails when unrelated decorated console output is captured in the same worker.
Confidence: high
Scope-risk: narrow
Directive: Preserve the invariant that the plaintext key appears exactly once and never in stderr/list output.
Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts
Not-tested: Full GitHub Actions matrix is pending after push.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ae04a9e9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/page-content.ts
)
: elementBody;
return {
content: [{ type: 'text', text: JSON.stringify(elementWithState) }],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prepend state header to page_content text responses

With OPENCHROME_STATE_HEADER enabled, this branch still emits content[0].text as raw JSON only, while other page-state tools now prepend the 4-line state preamble on the text channel. Any client that implements the new cross-tool text-header parsing contract will fail specifically on page_content, creating inconsistent behavior across tools in the same rollout.

Useful? React with 👍 / 👎.

Merge latest develop into PR #912 while preserving the branch's non-brittle Tier 1 visibility smoke test.

Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Restoring an exact Tier 1 count | this branch intentionally allows the Tier 1 surface to grow as core tools graduate.\nConfidence: high\nScope-risk: narrow\nDirective: Keep fixture refreshes semantically faithful to the feature branch instead of overwriting lower-bound guards with develop's exact-count snapshot.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
shaun0927 and others added 2 commits May 13, 2026 18:04
…etch

- formatHeaderText: replace \r\n in url and title with spaces so a
  crafted page title cannot split the fixed 4-line header and spoof
  subsequent fields (capturedAt, mode).
- read-page CSS path: wrap page.title() in a best-effort try/catch so
  a tab close between CSS payload assembly and response construction
  cannot convert a completed CSS snapshot into Read page error.

Addresses Codex P2 review comments on PR #912.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves conflicts in:
- src/tools/inspect.ts: keep both state-header and node-ref imports
- src/tools/read-page.ts: keep prependHeaderText wrapping on DOM/AX
  paths while adopting develop's nodeRefsBlock, fallbackNodeRefsBlock,
  _compression metadata, and refs:refsMap additions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 403f4cd359

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts Outdated
level: 'delta',
originalChars: outputText.length,
compressedChars: compressedText.length,
compressedChars: domDeltaPayload.length,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count state-header bytes in DOM delta compression stats

When OPENCHROME_STATE_HEADER is enabled (default) and read_page returns a DOM delta (compression: "delta"), content[0].text includes the prepended state header, but _compression.compressedChars is computed from domDeltaPayload before that header is added. This makes compressedChars too small and overstates saved bytes in recordToolOutputObservability, which can skew compression telemetry/experiments for this path. Compute compressedChars from the final emitted text (or include header length) to keep metrics accurate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd57c85dfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/page-content.ts
Comment on lines +31 to +33
required: ['tabId'],
},
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore required annotations on page_content definition

The page_content tool definition now omits the annotations field even though MCPToolDefinition treats it as required, so tools/list can emit a schema-incomplete entry for this tool and downstream clients that rely on annotation hints (or strict schema validation) may reject or mis-handle it. This also violates the repo’s own contract check in tests/unit/tool-annotations.test.ts that every tool declares annotations: TOOL_ANNOTATIONS.<tool>, making this a regression introduced by this change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c9d2b0e05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/types/mcp.ts
/** Required MCP-spec tool annotations. */
annotations: ToolAnnotations;
/** Optional MCP-spec tool annotations. */
annotations?: ToolAnnotations;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep tool annotations mandatory in MCPToolDefinition

Making annotations optional removes the compile-time guard that every registered tool advertises MCP annotation hints, so tools/list can now emit schema-incomplete tool entries whenever a definition omits that field. This is a contract regression for strict MCP clients and also undermines the server’s own assumption in registerTool that annotation presence is guaranteed, so this should remain required at the type level (or be filled with a default at registration).

Useful? React with 👍 / 👎.

Comment thread src/tools/page-content.ts
Comment on lines +129 to +131
? mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
fullPageBody,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Capture page state atomically with page_content payload

This branch captures HTML first and then fetches page.url()/page.title() for the new state object, so on pages that auto-navigate or redirect between those awaits, the returned metadata can describe a different document than the HTML payload. Because the feature’s purpose is to identify the snapshot’s origin, state fields should be derived from the same capture point as the content (or from already-captured values) instead of a later probe.

Useful? React with 👍 / 👎.

Codex P1: page_content lost its annotations field when the state-header
import block replaced the import line, so tools/list emitted a schema-
incomplete definition. Re-add the import and the annotations field so
the tool keeps publishing readOnlyHint/destructiveHint metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927
Copy link
Copy Markdown
Owner Author

Forthcoming review note: e19b264 restores annotations: TOOL_ANNOTATIONS.page_content (and the underlying import) on page_content's tool definition (Codex P1). It was inadvertently dropped when the state-header import block replaced the original import line, so tools/list was emitting a schema-incomplete definition. The state-header feature itself is unchanged. Re-running CI once develop CI signal is restored (PR #1105).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3b06c1528

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/page-content.ts
};
const fullPageWithState = isStateHeaderEnabled()
? mergeHeaderJson(
{ url: page.url(), title: await page.title(), mode: 'html' as const, capturedAt: Date.now(), tabId },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make page title lookup best-effort in page_content

In the new state-header path, page_content now awaits page.title() while assembling the response after HTML has already been captured. If the tab navigates or closes between page.content()/page.evaluate(...) and this lookup, Puppeteer throws and the outer catch converts an otherwise successful snapshot into Page content error. This regression affects unstable or redirecting pages and should be avoided by using a safe/best-effort title fallback so metadata enrichment cannot discard completed content.

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Owner Author

Develop CI restored, but this PR still has PR-specific test failures

PR #1105 has merged to develop and the develop CI baseline is now green (run 25808614236 was 9/9 on the test-hardened commit). After updating this branch via update-branch, current CI status is pass=0 fail=8 — the failures are PR-specific (not inherited from develop) and need author-driven debugging.

Addressed in this session

Author follow-up

  • Inspect this PR's failed jobs in the latest CI run; the failing suites are not inherited from develop. Common patterns I observed across the Group A PRs after the develop rebase: TaskJournal/journal summary regressions, snapshot drift in registration-default.snapshot.test.ts, and PR-feature test files (extract-data-modes, output-handles, etc.).
  • No outstanding Codex P0/P1/P2 on the latest commit; the PR is otherwise content-clean and ready for a final review once the PR-feature tests pass against the updated develop baseline.

Complete the state-header rollout for read_page so default text responses carry the same page-state preamble while opt-out mode remains byte-identical.

Constraint: CI showed state-header expectations failing because read_page was still returning bare page_stats text.
Rejected: Adding per-return title lookups throughout read_page | a single post-processing wrapper keeps header enrichment isolated and best-effort.
Confidence: high
Scope-risk: narrow
Directive: Header enrichment must not break OPENCHROME_STATE_HEADER=off byte parity or existing read_page diagnostics/refs contracts.
Tested: npm test -- --runTestsByPath tests/tools/state-header.test.ts --runInBand; npm test -- --runTestsByPath tests/tools/read-page.test.ts tests/tools/read-page-dom.test.ts tests/tools/snapshot-refs.test.ts --runInBand; npm run build
Not-tested: Full cross-platform CI after push
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5509eafb39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/read-page.ts
Comment on lines +1271 to +1272
const headerMode = mode === 'markdown' ? 'html' : mode;
const header = { url, title, mode: headerMode as 'ax' | 'dom' | 'css' | 'html', capturedAt: Date.now(), tabId };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Map semantic read_page mode to supported header value

read_page accepts mode: "semantic", but this branch passes that value through into state.mode (headerMode only remaps markdown). That violates the declared PageStateHeader.mode contract (ax|dom|css|html|inspect|validate) and can break clients that validate the returned state object against the advertised enum when state headers are enabled and semantic mode is requested. Normalize semantic to a supported header mode (or explicitly extend the shared enum/schema everywhere).

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Owner Author

Reviewed and validated for merge.

This PR adds a unified state header to page-state tool responses. The direction is valid because it standardizes lightweight state/provenance context across read-style outputs while preserving existing tool behavior.

Additional hardening completed before merge:

  • Applied the state header to read_page responses without replacing the existing content contracts.
  • Verified the affected read-page DOM/snapshot reference paths still behave correctly.

Validation evidence:

  • npm test -- --runTestsByPath tests/tools/state-header.test.ts --runInBand
  • npm test -- --runTestsByPath tests/tools/read-page.test.ts tests/tools/read-page-dom.test.ts tests/tools/snapshot-refs.test.ts --runInBand
  • npm run build
  • GitHub CI is green: 9/9 checks successful.

Given the additive scope, passing CI, and preserved read_page behavior, this PR is safe to merge.

@shaun0927 shaun0927 merged commit 74d64c2 into develop May 13, 2026
9 checks passed
shaun0927 added a commit that referenced this pull request May 13, 2026
The token-metrics PR had to absorb develop's state-header work without regressing either contract. Semantic read_page metrics are now refreshed after the state header is merged, and inspect output keeps both the state header and optional metrics footer.

Constraint: PR #1077 became dirty after #912 merged state headers into develop.

Rejected: Dropping state headers from metrics-enabled outputs | it would regress the newly merged page-state contract.

Confidence: high

Scope-risk: moderate

Directive: Treat metrics as additive post-processing over the final emitted payload, including state headers.

Tested: npm test -- --runTestsByPath tests/tools/read-page.test.ts tests/tools/state-header.test.ts --runInBand; npm run build; npm test -- --runTestsByPath tests/core/metrics/token-estimate.test.ts tests/tools/read-page-dom.test.ts tests/tools/inspect-metrics.test.ts tests/core/tools/crawl.engine.test.ts --runInBand; npm run lint:tool-schemas

Not-tested: Full CI pending after push

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927 shaun0927 deleted the feat/893-state-header branch May 14, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant