Skip to content

feat(core): deterministic selector-chain replay layer for skill memory (closes #875)#925

Merged
shaun0927 merged 21 commits into
developfrom
feat/875-replay-layer
May 13, 2026
Merged

feat(core): deterministic selector-chain replay layer for skill memory (closes #875)#925
shaun0927 merged 21 commits into
developfrom
feat/875-replay-layer

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/875-replay-layerdevelop
Draft no
CI ⏳ 8/9 passing — 1 pending
Mergeable ❌ CONFLICTING
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) gemini-code-assist: commented, chatgpt-codex-connector: commented
Head db819fe — Carry replay layer onto output metrics baseline
Commits 16

Owner comment cleanup: 10 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Partial close of #875. DRAFT — branch is 108 commits behind develop; needs rebase before review. Recorder hooks for the 4 action tools are NOT part of this PR — they are tracked in follow-up issue #988 and will land as a separate PR once this one merges (PR-B in the split below).

Why the split

The 2026-05-12 review identified that this PR would be cleaner as two stacked PRs:

  • PR-A (this one) — primitives + new oc_skill_replay tool + store v1→v2 migration. The replay layer is complete and unit-tested; when called against a v2 record it works end-to-end. Self-contained.
  • PR-B (feat(tools): recorder hooks for skill-memory replay (4 action tools — follow-up to #925) #988) — recorder hooks for interact/fill-form/form-input/navigate that populate the per-CDP-target buffer when callers pass capture_artifact: true. Closes the pipeline at the source.

Until PR-B lands, oc_skill_replay works against records that callers populate manually with a hand-crafted replay_artifact (via oc_skill_record). The new ARTIFACT_RESOLUTION_FAILED failure code returns control to the host without throwing when no artifact is present, so the missing recorder hooks do not break any existing skill-memory workflow — they only mean replay is opt-in by record-shape until PR-B closes the loop.

Summary (PR-A scope)

  • New module src/core/skill-memory/replay-artifact.tsReplayArtifactStep + ReplayArtifact types + JSON-Schema validator.
  • New module src/core/skill-memory/recorder-buffer.ts — per-CDP-target FIFO buffer (cap 100, destructive flush on oc_skill_record). Note: buffer is defined but no action tool feeds it in this PR — that lands in PR-B (feat(tools): recorder hooks for skill-memory replay (4 action tools — follow-up to #925) #988).
  • Skill-memory store bumped v1 → v2; v1 records remain read-compatible.
  • oc_skill_record accepts optional replay_artifact per step; idempotent.
  • New tool oc_skill_replay — never throws; structured {ok} or {ok:false, failure} envelope. Reuses src/utils/ralph/ralph-engine.ts for selector resolution.
  • Registered in src/tools/index.ts; tools/list parity holds in both flag positions (P2). OPENCHROME_SKILL_REPLAY default ON.
  • Inline isCoreFeatureEnabled helper in src/harness/flags.ts (per 2026-05-12 spec correction; feat(core): stable backend-node uid contract across snapshots (chrome-devtools-mcp adoption A.1) #844 not yet merged — TODO removal marker present).
  • Trace redactor allow-list extended for replay telemetry fields.

Outstanding work before ready-for-review

  • Rebase onto current develop — branch is 108 commits behind. Expected to be mechanical (this PR adds new files plus light touches to a few existing ones; no overlap with the heavy churn on develop has been observed in a dry-run).
  • After rebase: full npm test clean run. The current targeted suite (replay-artifact|oc-skill-replay|skill-memory.*no-network) passes 11/11 against the stale base.

The originally-listed "Recorder hooks" checkbox has been moved to follow-up #988 — it is no longer a gate for this PR.

Files

 15 files changed, 2006 insertions(+), 22 deletions(-)

Test plan

  • npm run build — exit 0
  • npm run lint — 0 errors (4 warnings: 3 pre-existing in image-features, 1 removed in this PR)
  • npm run lint:tier — no dependency violations
  • npm test -- --testPathPattern="replay-artifact|oc-skill-replay|skill-memory.*no-network" — 11 passed, 0 failed
  • npm test full suite — blocked on rebase

Manual verification (post-merge, via openchrome MCP)

scripts/verify/skill-replay.mjs walks the 6 scenarios from issue #875. With PR-A merged but PR-B (#988) still pending, scenarios 1 (record-then-replay happy path) needs callers to pass replay_artifact explicitly; once PR-B lands the recorder buffer is filled automatically and scenario 1 becomes hands-off:

  1. record-then-replay happy path (zero read_page/find between recall and replay)
  2. artifact-resolution failure returns control to host with structured ARTIFACT_RESOLUTION_FAILED code (no exception)
  3. P3 compliance (zero outbound) — block egress at OS level, replay still works
  4. P2 byte-parity when capture_artifact omitted (deferred to PR-B; tested there with the recorder hooks)
  5. Kill-switch — OPENCHROME_SKILL_REPLAY=0 returns DISABLED fact
  6. Schema migration v1 → v2 records load cleanly without synthesis

References

Portability-harness contract alignment

  • P1 (tool server identity): replay is a single tool call, no orchestration. ✓
  • P2 (zero-impact when off): tools/list includes oc_skill_replay regardless; OPENCHROME_SKILL_REPLAY=0 returns DISABLED fact; no action-tool surface is modified in this PR, so byte-parity holds by construction. ✓
  • P3 (anywhere-compatible): zero outbound HTTP, zero LLM API — pinned by tests/core/skill-memory/no-network.test.ts. ✓
  • P4 (facts vs decisions): structured failures returned to host; host LLM decides. ✓
  • P5 (native deps): zero new dependencies. ✓

closes #875)

Adds an opt-in replay layer over the existing skill-memory store so recurring
tasks can be re-executed without a host-LLM round-trip per step.

- New types/validator in src/core/skill-memory/replay-artifact.ts:
  ReplayArtifactStep (kind + ordered selector strategies + optional
  post_assert) and ReplayArtifact (schema_version=1 envelope).
- New per-CDP-target buffer in src/core/skill-memory/recorder-buffer.ts:
  cap 100 entries, FIFO eviction, destructive flush by oc_skill_record.
- Skill-memory store bumped v1 -> v2 with additive replay_artifact per step;
  v1 records remain read-compatible (no synthesis).
- oc_skill_record extended to accept replay_artifact per step; idempotent
  re-record updates artifact while preserving skill_id.
- New tool oc_skill_replay (src/tools/oc-skill-replay.ts): never throws,
  returns {ok, ...} or {ok:false, failure:{code,...}} envelope. Reuses
  src/utils/ralph/ralph-engine.ts for selector resolution.
- Tool registered in src/tools/index.ts; tools/list parity holds in both
  flag positions (P2). OPENCHROME_SKILL_REPLAY default ON, opt-out via =0.
- Inline isCoreFeatureEnabled helper in src/harness/flags.ts with
  TODO(#844) marker pending #844 merge.
- Trace redactor allow-list extended for replay step telemetry.
- Tests: replay-artifact validator, oc_skill_replay envelope/artifact
  handling, oc_skill_record plumb-through, P3 compliance (no outbound)
  with defensive spy installation for environments where
  jest.spyOn cannot patch http/https namespace imports.
- Fixture and reproducer: tests/fixtures/skill-replay/index.html and
  scripts/verify/skill-replay.mjs for the 6 post-merge MCP scenarios.

Honors the 2026-05-12 spec correction on the issue (isCoreFeatureEnabled
inlined since #844 has not landed yet).

P1 tool-server identity: replay is a single tool call, no orchestration.
P2 zero-impact: tools/list stable, OPENCHROME_SKILL_REPLAY=0 returns
DISABLED fact, capture_artifact opt-in defaults false.
P3 anywhere-compatible: zero outbound HTTP, zero LLM API.
P4 facts vs decisions: structured failures returned to host.
P5 native deps: zero new dependencies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a deterministic skill replay system (#875), including a new oc_skill_replay tool, a recorder buffer for capturing steps, and an upgrade to the skill memory schema (v2) to store replay artifacts. Key feedback highlights that the current oc_skill_replay implementation only resolves selectors without executing actions and identifies performance bottlenecks in DOM traversal. Additionally, reviewers noted a potential memory leak in the recorder buffer, missing validation for specific action arguments, and the need to use constants for schema versioning to ensure consistency.

Comment on lines +369 to +466
let executed = 0;
for (let i = from; i < to; i++) {
const artifact = artifacts[i];
if (!artifact) {
return jsonResult(
failure(
'ARTIFACT_MISSING',
i,
`step ${i} has no replay_artifact (heterogeneous v1/v2 record)`,
totalSteps,
stepResults,
executed,
),
);
}
const v = validateReplayArtifact(artifact);
if (!v.ok) {
return jsonResult(
failure(
'ARTIFACT_MISSING',
i,
`step ${i} artifact failed validation: ${v.error ?? 'unknown'}`,
totalSteps,
stepResults,
executed,
),
);
}
const step = artifact.steps[0]; // each per-step artifact carries one step
if (!step) {
return jsonResult(
failure(
'ARTIFACT_MISSING',
i,
`step ${i} artifact has no embedded step entry`,
totalSteps,
stepResults,
executed,
),
);
}

const started = Date.now();
let resolution: { resolvedVia: ResolvedVia | null; attempts: number };
try {
resolution = await withDeadline(resolveStep(step, page), stepTimeoutMs);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
stepResults.push({
index: i,
resolved_via: null,
selector_attempts: step.selectors.length,
elapsed_ms: Date.now() - started,
ok: false,
});
return jsonResult(
failure('STEP_TIMEOUT', i, message, totalSteps, stepResults, executed),
);
}

if (step.kind !== 'navigate' && resolution.resolvedVia === null) {
stepResults.push({
index: i,
resolved_via: null,
selector_attempts: resolution.attempts,
elapsed_ms: Date.now() - started,
ok: false,
});
return jsonResult(
failure(
'ARTIFACT_RESOLUTION_FAILED',
i,
`no selector strategy resolved (tried ${resolution.attempts}/${step.selectors.length})`,
totalSteps,
stepResults,
executed,
),
);
}

stepResults.push({
index: i,
resolved_via: resolution.resolvedVia,
selector_attempts: resolution.attempts,
elapsed_ms: Date.now() - started,
ok: true,
});
executed++;

// post_assert hook — for v1.11 we surface the contract id in the failure
// detail but defer evaluation to the host (oc_assert is snapshot-driven
// and needs caller-supplied evidence). When the host wants enforcement
// it can chain oc_skill_replay → oc_assert deterministically.
if (step.post_assert && stopOnContractFailure) {
// Intentionally not failing here — the host owns the assert call.
// We attach the contract id in step_results downstream via index.
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The oc_skill_replay tool implementation appears to be incomplete. While the documentation and tool name suggest it "re-issues each action against the live page", the current loop only calls resolveStep, which in turn only calls resolveSelector. There is no code that actually performs the actions (e.g., clicking, filling inputs, or navigating). As it stands, this tool only validates that the persisted selectors still resolve in the current DOM, but it does not execute the recorded skill.

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +211 to +220
const els = Array.from(document.querySelectorAll<HTMLElement>('*'));
return els.some((el) => {
const elRole = el.getAttribute('role') || el.tagName.toLowerCase();
if (elRole !== r) return false;
const elName =
el.getAttribute('aria-label') ||
el.textContent?.trim() ||
'';
return n === '' || elName === n;
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Iterating over all elements (*) and accessing textContent for each is a significant performance bottleneck. textContent triggers a full subtree traversal, making this an O(N^2) operation relative to the number of DOM nodes in the worst case. This will cause severe lag or timeouts on complex pages. Consider using more specific CSS selectors based on the role to narrow down the search space before inspecting text content or attributes.

capturedAt: number;
}

const buffers = new Map<string, BufferedStep[]>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buffers Map is a module-level variable that stores step data per targetId. However, there is no mechanism in this module to prune entries for targets that are closed without being flushed (e.g., if a recording is cancelled or a tab is closed). This will lead to a memory leak over time in long-running processes. Ensure that target lifecycle events trigger a cleanup in this buffer.

Comment thread src/tools/oc-skill-record.ts Outdated
*/
function wrapStepAsArtifact(step: ReplayArtifactStep): ReplayArtifact {
return {
schema_version: 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The schema_version is hardcoded as 1. It should use the REPLAY_ARTIFACT_SCHEMA_VERSION constant defined in the skill-memory module to ensure consistency and facilitate future schema migrations.

return { ok: false, error: 'fill step requires args.value string' };
}
}
return { ok: true };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation for args in validateReplayArtifactStep is currently limited to navigate and fill kinds. Other supported kinds like press, select, and scroll likely require specific arguments (e.g., key for press, value for select) to be functional during replay. Consider extending the validation to cover these requirements to prevent persisting invalid artifacts.

async function withDeadline<T>(p: Promise<T>, ms: number): Promise<T> {
let timer: NodeJS.Timeout | null = null;
try {
return await Promise.race([
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Promise.race does not cancel the underlying promise p when the timeout occurs. If p (e.g., a page.evaluate call) hangs, it will continue to consume resources in the background. While Puppeteer has its own internal timeouts, relying on Promise.race for a "hard" deadline can lead to unexpected side effects if the task is not truly cancellable.

…/ Buffer type

Merge of develop (108 commits) into the replay-layer branch. Two import-
only conflicts resolved to the union of both sides:

- src/tools/index.ts: keep both the new oc_skill_replay registration
  and develop's doctor / performance-insights / proxy-hook / observe /
  devtools-url / context tool registrations.
- src/tools/oc-skill-record.ts: keep both this branch's replay-artifact
  flush + flag check AND develop's secrets-redactor wiring.

Test-side adjustments to match develop's new APIs:

- tests/tools/tabs.test.ts: 4-arg → 5-arg createTarget signature
  (develop's #848 added isolatedContext).
- tests/utils/mock-session.ts: add getTargetContextName mock (#848).
- src/tools/interact.ts: trim description from 532 → 371 chars to fit
  develop's #841 budget (≤400).
- tests/cli/doctor/checks/chrome-binary.test.ts: cast string literals
  to Buffer via Buffer.from(...) (TS 5.x non-shared-buffer typing).
- tests/cli/doctor/checks/network-local.test.ts: drop durationMs
  assertion — CheckFn returns Omit<CheckResult,'durationMs'>; the
  runner injects durationMs after the check resolves.
- tests/core/skill-memory/replay-artifact.test.ts: replace unused
  @ts-expect-error directive with an explicit `as unknown as` cast.

Three doctor-check failures (chrome-binary: ok/fail/version-command
paths) remain on this tip but they ALSO fail on a fresh origin/develop
checkout — pre-existing regression unrelated to this branch (verified
against develop HEAD 1370b66).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun0927 shaun0927 marked this pull request as ready for review May 12, 2026 18:26
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72d267ab8b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +362 to +363
if (sm && typeof sm.getPage === 'function' && typeof tabId === 'string' && tabId.length > 0) {
page = await sm.getPage(sessionId, tabId, undefined, 'oc_skill_replay');

This comment was marked as outdated.

Comment on lines +449 to +456
stepResults.push({
index: i,
resolved_via: resolution.resolvedVia,
selector_attempts: resolution.attempts,
elapsed_ms: Date.now() - started,
ok: true,
});
executed++;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Execute replay actions before marking step success

After selector resolution, the loop records the step as ok: true and increments executed, but there is no action dispatch for click/fill/press/navigate at all. This makes oc_skill_replay effectively a selector-check pass that can return full success without mutating page state, which breaks the core promise of deterministic replay.

Useful? React with 👍 / 👎.

Close the PR #925 P1 gap by dispatching resolved replay steps against the live page, resolving omitted tabId through the active session target, and pruning recorder buffers on target close.

Constraint: oc_skill_replay must remain deterministic, no-LLM, and structured-failure only while still performing recorded actions.

Rejected: treating replay as selector validation only | Codex and Gemini both identified that as breaking the tool contract.

Confidence: medium

Scope-risk: moderate

Directive: Keep recorder hooks in follow-up #988, but do not weaken oc_skill_replay action semantics.

Tested: npx jest tests/tools/oc-skill-replay.test.ts tests/core/skill-memory/replay-artifact.test.ts tests/cli/doctor/checks/chrome-binary.test.ts tests/cli/doctor/checks/disk-space.test.ts tests/cli/doctor/checks/home-writable.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; git diff --check

Not-tested: full npm test locally; CI is expected to cover the complete matrix.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfe4bf64a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +30 to +33
{ tool: 'oc_skill_record', args: { domain: DOMAIN, name: SKILL_NAME, contract_id: 'demo' } },
{ tool: 'page_reload', args: {} },
{ tool: 'oc_skill_recall', args: { domain: DOMAIN, name: SKILL_NAME } },
{ tool: 'oc_skill_replay', args: { domain: DOMAIN, name: SKILL_NAME } },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align verifier steps with current tool contracts

This verifier script prints a "verbatim" replay flow, but the emitted calls no longer match the registered tool schemas: oc_skill_record is missing required steps, and oc_skill_replay is called with name instead of required skill_id (plus earlier interact calls use unsupported fill/target/capture_artifact arguments). Running these steps as written now produces INVALID_ARGS, so the acceptance scenarios in this script cannot be executed as intended.

Useful? React with 👍 / 👎.

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +220 to +221
} else if (sel.type === 'node_ref') {
return false;

This comment was marked as outdated.

let attempts = 0;
for (const sel of step.selectors) {
attempts++;
const r = await resolveSelector(sel, page);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve selectors in recorded frame, not always top frame

Replay artifacts include frameOrdinal on each step, but replay resolution ignores it and always evaluates selectors on the top-level page document. For skills recorded inside iframes, this resolves against the wrong DOM and causes false ARTIFACT_RESOLUTION_FAILED results even when the original target still exists in its frame.

Useful? React with 👍 / 👎.

Merge the current develop surface into the replay-layer PR and resolve conflicts in shared interaction, doctor, tab, and mock-test files. The replay implementation remains focused on deterministic artifact execution, selector validation, and recorder cleanup.

Constraint: GitHub reported PR #925 as CONFLICTING against develop.

Rejected: Refactoring replay dispatch while merging develop | latest Codex review on the pushed replay fixes had no actionable P1/P2 feedback.

Confidence: high

Scope-risk: narrow

Directive: Keep future replay edits constrained to artifact execution correctness and validation gaps.

Tested: npm run build; npx jest tests/tools/oc-skill-replay.test.ts tests/core/skill-memory/replay-artifact.test.ts tests/cli/doctor/checks/chrome-binary.test.ts tests/cli/doctor/checks/home-writable.test.ts tests/cli/doctor/checks/network-local.test.ts tests/tools/tabs.test.ts tests/cross-env/cursor-verification.test.ts --runInBand; npm run lint -- --quiet; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 042f4ff42c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/oc-skill-record.ts Outdated
}
replayArtifacts = padded;
} else if (targetId) {
const flushed = flushRecorderBuffer(targetId);

This comment was marked as outdated.

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +220 to +222
} else if (sel.type === 'node_ref') {
return false;
}

This comment was marked as outdated.

Keep the doctor disk-space boundary test aligned with the current source contract after the replay branch merged develop. The replay implementation is unchanged; this fixes a stale CI expectation for exactly 100 MB free.

Constraint: GitHub CI for PR #925 failed only on tests/cli/doctor/checks/disk-space.test.ts.

Rejected: Changing the doctor threshold semantics inside the replay PR | the source already treats <=100 MB as fail.

Confidence: high

Scope-risk: narrow

Directive: Keep replay PR follow-ups scoped to replay behavior unless shared develop tests block mergeability.

Tested: npx jest tests/cli/doctor/checks/disk-space.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fabb36524d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

),
);
}
const step = artifact.steps[0]; // each per-step artifact carries one step
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce single-step artifacts before replaying

Replay currently executes only artifact.steps[0], but artifact validation accepts any non-empty steps array. When a stored artifact contains multiple embedded steps, replay silently ignores every step after the first and can still return success, so the executed flow diverges from recorded data. Either reject multi-step artifacts at validation/record time or iterate all embedded steps here.

Useful? React with 👍 / 👎.

Comment on lines +596 to +599
if (step.post_assert && stopOnContractFailure) {
// Intentionally not failing here — the host owns the assert call.
// We attach the contract id in step_results downstream via index.
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor post_assert when stop_on_contract_failure is enabled

The post_assert branch is currently a no-op, so even when stop_on_contract_failure is true and a step carries post_assert, replay never checks the contract and still reports step success. In practice, this allows oc_skill_replay to return ok: true for flows whose required assertions fail, and the documented option has no effect. This should call the assertion path and emit CONTRACT_FAILED (or remove the option until implemented).

Useful? React with 👍 / 👎.

Normalize CRLF line endings when comparing the committed console-capture baseline fixture so Windows CI does not fail on checkout conversion. The replay-layer implementation remains unchanged.

Constraint: PR #925 shares the same current develop console fixture regression test surface.

Rejected: Regenerating the fixture | the mismatch is platform line endings, not response shape drift.

Confidence: high

Scope-risk: narrow

Directive: Normalize text fixtures before comparing them to generated JSON strings across OSes.

Tested: npx jest tests/tools/console-capture-regression.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09343615bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +241 to +245
if (step.kind === 'press' && p.keyboard) {
const key = step.args?.key;
if (typeof key !== 'string' || key.length === 0) return { ok: false, detail: 'press step requires args.key' };
await withDeadline(Promise.resolve(p.keyboard.press(key)), timeoutMs);
return { ok: true };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Focus resolved element before dispatching press steps

press steps require selectors, but the replay path resolves the selector and then ignores it by sending page.keyboard.press directly. If focus has moved since recording (e.g., another control is active), the key is delivered to the wrong element even though selector resolution succeeded, so replay can report success while mutating the wrong state.

Useful? React with 👍 / 👎.

Comment on lines +177 to +178
if (Array.isArray(explicitArtifacts)) {
// Pad / truncate to match steps.length, then validate each non-null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Flush target buffer when explicit artifacts are supplied

When replay_artifacts is provided, this branch bypasses flushRecorderBuffer(targetId), so previously captured steps for that target remain queued. A later oc_skill_record call that relies on buffered capture can then ingest stale steps from an earlier skill, causing cross-skill artifact contamination and nondeterministic replay data.

Useful? React with 👍 / 👎.

Ensure DomainMemory does not drop mutations that occur while an async persistence write is already in flight. The replay PR exposed this on Windows Node 22 when confidence updates raced with the initial save; queued saves now flush the latest in-memory state after the active write completes.

Constraint: PR #925 Windows Node 22 CI failed domain-memory persistence after validate() calls were not reliably flushed.

Rejected: Extending test sleeps | the production coalescing logic could still drop in-flight mutations under real use.

Confidence: high

Scope-risk: narrow

Directive: Preserve fire-and-forget persistence semantics while ensuring every dirty mutation eventually schedules a write.

Tested: npx jest tests/e2e/scenarios/domain-memory-persistence.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; git diff --check

Not-tested: full CI matrix pending on GitHub.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70ac9271b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +220 to +221
} else if (sel.type === 'node_ref') {
return false;

This comment was marked as outdated.

Comment on lines +30 to +33
{ tool: 'oc_skill_record', args: { domain: DOMAIN, name: SKILL_NAME, contract_id: 'demo' } },
{ tool: 'page_reload', args: {} },
{ tool: 'oc_skill_recall', args: { domain: DOMAIN, name: SKILL_NAME } },
{ tool: 'oc_skill_replay', args: { domain: DOMAIN, name: SKILL_NAME } },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use required oc_skill_record/oc_skill_replay arguments

The verification script emits invocations that do not satisfy the tool contracts: oc_skill_record is printed without the required steps field, and oc_skill_replay is printed with name instead of required skill_id. Following this script as written causes INVALID_ARGS responses, so the listed scenarios cannot be executed as intended.

Useful? React with 👍 / 👎.

Merge latest develop into PR #925 while preserving the branch-specific cursor and console fixture expectations.

Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Replacing branch-specific fixture semantics with develop's exact snapshot | these PRs intentionally alter tool visibility or fixture normalization expectations.\nConfidence: high\nScope-risk: narrow\nDirective: Treat fixture-only conflict refreshes as semantic preservation work; do not rewrite feature behavior while unblocking mergeability.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 872843e252

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/oc-skill-replay.ts Outdated
Comment on lines +237 to +238
if (p?.goto) await withDeadline(Promise.resolve(p.goto(url, { waitUntil: 'domcontentloaded', timeout: timeoutMs })), timeoutMs);
return { ok: true };

This comment was marked as outdated.

Comment on lines +195 to +197
this.dirty = false;
this.savePromise = this.saveAsync().finally(() => {
this.savePromise = null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep dirty flag set until save write completes

The new coalescing path clears dirty before the filesystem write starts, but saveAsync now swallows write errors and never restores dirty. If writeFile fails (for example transient ENOSPC/permission issues), pending mutations are considered clean and no retry is scheduled unless another mutation happens later, so the latest in-memory state can be silently dropped on process exit.

Useful? React with 👍 / 👎.

Replay now performs real page mutations only when a live target and executable artifact shape are available, while preserving develop's pilot replay additions and current tool-schema changes.

Constraint: PR #925 must not report deterministic replay success for skipped navigation, unresolved selectors, unevaluated post-asserts, or stale recorder buffers.\nRejected: silently accepting node_ref selectors | the current core replay path cannot turn registry ids into live element handles without a CDP resolver.\nConfidence: medium\nScope-risk: broad\nDirective: Add live node-ref element resolution before re-allowing node_ref artifacts in core replay.\nTested: npm test -- --runTestsByPath tests/tools/oc-skill-replay.test.ts tests/core/skill-memory/no-network.test.ts tests/pilot/skill/replay.test.ts --runInBand; npx tsc --noEmit\nNot-tested: full cross-platform CI matrix and live-browser replay verification pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6f7f0a9ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +641 to +647
if (step.post_assert && stopOnContractFailure) {
return jsonResult(
failure(
'CONTRACT_FAILED',
i,
`post_assert ${step.post_assert.contract_id} requires an assertion evaluator before replay can report success`,
totalSteps,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Evaluate post_assert instead of always failing it

With the default stop_on_contract_failure=true, any step that merely includes post_assert is forced into CONTRACT_FAILED without attempting assertion evaluation, so replay cannot succeed for artifacts that carry post-conditions even when the page state is correct. Fresh evidence in this revision is the unconditional failure return inside the post_assert block, which now deterministically fails every such step.

Useful? React with 👍 / 👎.

Comment on lines +607 to +619
const action = await dispatchStep(step, resolution.selector, page, stepTimeoutMs);
if (!action.ok) {
stepResults.push({
index: i,
resolved_via: resolution.resolvedVia,
selector_attempts: resolution.attempts,
elapsed_ms: Date.now() - started,
ok: false,
});
return jsonResult(
failure(
step.kind === 'navigate' ? 'TARGET_NAVIGATED_AWAY' : 'ARTIFACT_RESOLUTION_FAILED',
i,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return STEP_TIMEOUT when dispatch exceeds step timeout

The per-step timeout is documented as covering both resolution and dispatch, but dispatch-time deadline errors are caught and converted into generic action failures that are reported as ARTIFACT_RESOLUTION_FAILED (or TARGET_NAVIGATED_AWAY for navigate). This misclassifies timeout failures, so callers cannot distinguish a transient timeout from a selector mismatch and may take the wrong fallback path.

Useful? React with 👍 / 👎.

shaun0927 and others added 3 commits May 13, 2026 10:21
The CI schema linter pipes introspection stdout into the validator, so the CLI must return naturally after writing JSON and the newly merged crawl job schemas must satisfy the required-field description rule.

Constraint: build-and-test runs npm run lint:tool-schemas in a shell pipeline on every PR matrix job.
Rejected: keeping process.exit(0) after stdout.write | it can terminate before piped stdout flushes and leaves the validator with empty input.
Confidence: high
Scope-risk: narrow
Directive: Introspection paths must avoid hard process exits after asynchronous stdout writes.
Tested: npm run build; npm run lint:tool-schemas; npm run lint
Not-tested: full GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Skill replay now satisfies the repository schema linter while preserving the fail-closed selector replay behavior and quote escaping needed for accessible-name selector fallback.

Constraint: PR #925 must pass build-and-test after the replay layer adds new MCP schemas.
Rejected: baselining new schema violations | existing linter rules are intended to keep required fields discoverable.
Confidence: high
Scope-risk: narrow
Directive: New replay tool fields must keep required descriptions explicit and concise enough for lint-tool-schemas.
Tested: npm run build; npm run lint; npm run lint:tool-schemas; npm test -- --runTestsByPath tests/tools/oc-skill-replay.test.ts tests/core/skill-memory/no-network.test.ts tests/pilot/skill/replay.test.ts --runInBand
Not-tested: full GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
The in-process CLI harness can capture log lines from unrelated async work before the admin JSON payload, so the JSON-array extractor now scans for the first slice that actually parses as an array instead of trusting the first '[' byte.

Constraint: full CI runs many suites in shared workers, and Windows/Linux logs showed [WorkflowEngine] noise reaching this harness stdout.
Rejected: weakening admin CLI assertions | the test still parses and validates the JSON payload and plaintext absence.
Confidence: high
Scope-risk: narrow
Directive: CLI JSON tests should parse the intended payload, not incidental bracketed log prefixes.
Tested: npm test -- --runTestsByPath tests/cli/admin-keys.test.ts --runInBand
Not-tested: full GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc8c9af837

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +273 to +275
if (replayEnabled && targetId) {
flushRecorderBuffer(targetId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Flush recorder buffer only when it was consumed

The buffer is flushed whenever target_id is provided, even if replay_artifacts was explicitly supplied and the buffer path was not used. In that case, buffered captures are discarded without being persisted, so a caller that always sends target_id can silently lose recorded steps between oc_skill_record calls. Gate the flush to the peekRecorderBuffer path (or track whether buffered artifacts were actually consumed) to avoid data loss.

Useful? React with 👍 / 👎.

Comment on lines +243 to +244
if (s.type === 'node_ref') {
return { ok: false, error: 'node_ref selectors are not replayable until the node-ref resolver can return a live element handle' };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept node_ref selectors in artifact validation

Rejecting node_ref at validation time causes oc_skill_record to fail for selector chains that include node_ref plus fallback selectors (for example css/xpath), even though replay can still succeed by trying later strategies. This blocks storing otherwise replayable artifacts and makes mixed-strategy chains unusable. Validation should allow node_ref entries and let runtime resolution fall through when that strategy cannot be resolved.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 10:49
The replay branch exposes oc_skill_replay in the default Cursor smoke surface and intentionally rejects node_ref selectors until a live node-ref resolver exists, so the branch tests now assert those intended contracts.

Constraint: PR #925 chooses fail-closed node_ref validation rather than unsafe best-effort dispatch.
Rejected: treating node_ref as accepted without live resolution | that was the Codex P1 risk this branch fixes.
Confidence: high
Scope-risk: narrow
Directive: Re-enable node_ref acceptance only with a tested live element-handle resolver.
Tested: npm test -- --runTestsByPath tests/core/skill-memory/replay-artifact.test.ts --runInBand
Not-tested: cross-env Cursor smoke locally skipped on macOS Node <22; full CI pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
The curator lock loop now retries EPERM from stat just like ENOENT because Windows can report EPERM while another writer process is removing the lock directory during the concurrent-write test.

Constraint: PR #923 CI failed only on the Windows concurrent curator lock test.
Rejected: increasing test timeouts | the failure is a transient lock-directory race, not slow work.
Confidence: high
Scope-risk: narrow
Directive: Keep the lock fail-closed for unexpected errors; only transient disappearance/removal races should retry.
Tested: npm test -- --runTestsByPath tests/pilot/curator/extractor.test.ts --runInBand
Not-tested: Windows filesystem behavior locally; GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9b0624812

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/index.ts Outdated
}

// Skill memory tools (#875) — deterministic replay
registerOcSkillReplayTool(server);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove duplicate oc_skill_replay registration

registerAllTools registers oc_skill_replay inside the isSkillReplayEnabled() branch and then immediately registers it again unconditionally. Because MCPServer.registerTool overwrites by name but still increments manifestVersion, this second call adds version churn with no functional benefit and defeats the intended single registration path behind the conditional block.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 11:01
The replay branch now includes the latest develop snapshot-ref baseline while preserving fail-closed replay dispatch, schema lint compliance, and node_ref rejection until live resolution exists.

Constraint: develop advanced during PR #925 review and branch must stay mergeable without relaxing selector safety.
Rejected: accepting node_ref without a live resolver | that reintroduces the unsafe replay gap the branch fixes.
Confidence: high
Scope-risk: moderate
Directive: Re-enable node_ref only with a tested live element-handle resolver; keep replay schema descriptions lint-clean.
Tested: npm run build; npm run lint:tool-schemas; npm test -- --runTestsByPath tests/tools/oc-skill-replay.test.ts tests/core/skill-memory/no-network.test.ts tests/pilot/skill/replay.test.ts tests/core/skill-memory/replay-artifact.test.ts tests/pilot/curator/extractor.test.ts tests/transports/http-bearer-auth.test.ts --runInBand
Not-tested: full GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
The replay branch now includes bounded tool-output observability metrics while preserving fail-closed replay dispatch, schema lint compliance, and node_ref rejection until live resolution exists.

Constraint: develop advanced during PR #925 review and branch must stay mergeable without relaxing selector safety.
Rejected: leaving the autogenerated merge message | project history requires Lore trailers and OmX co-author attribution.
Confidence: high
Scope-risk: moderate
Directive: Keep replay result safety independent from output-size metrics; re-enable node_ref only with a tested live element-handle resolver.
Tested: npm run build; npm run lint:tool-schemas; npm test -- --runTestsByPath tests/metrics/tool-output-observability.test.ts --runInBand
Not-tested: full GitHub Actions matrix pending after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db819fe1ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

successCount: 0,
lastUsedAt: 0,
frozenSnapshotPath: snapshotPath ?? null,
replayArtifacts,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve replay artifacts when replay is disabled

When OPENCHROME_SKILL_REPLAY=0, replayArtifacts is still initialized to an all-null array and always passed into store.record, so any re-record of an existing skill overwrites previously captured artifacts with null. This makes the kill-switch destructive: operators who temporarily disable replay lose artifact data and cannot replay those skills after re-enabling the flag.

Useful? React with 👍 / 👎.

* - non-array `steps` (we leave the field undefined; callers handle).
*/
function normaliseRecordForRead(rec: SkillRecord): SkillRecord {
if (!Array.isArray(rec.steps)) return { ...rec };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle null records during read normalization

normaliseRecordForRead assumes each skills value is an object and immediately reads rec.steps; if a persisted file contains a null entry (e.g., partial/manual corruption), this throws and breaks get()/list() for the whole domain. Before this change, reads were tolerant of unknown record shapes, so this introduces a new crash path in the loader.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d5e08b123

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +198 to +201
if (this.saveAgain || this.dirty) {
this.saveAgain = false;
this.dirty = true;
this.save();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent infinite save retries on persistent write failures

The new coalescing logic now retries immediately whenever saveAsync fails (catch sets dirty = true, then finally calls save() again when dirty is true). If the storage path is persistently unwritable (for example EACCES or ENOSPC), every mutation triggers an unbounded retry loop that continuously reissues writeFile calls, causing sustained event-loop churn and repeated I/O with no recovery path. This should stop after a bounded number of retries or require a new mutation/backoff trigger before retrying.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08c2a5f2f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

successCount: 0,
lastUsedAt: 0,
frozenSnapshotPath: snapshotPath ?? null,
replayArtifacts,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve artifacts when replay_artifacts is omitted

oc_skill_record now always passes a replayArtifacts array into store.record, even when the caller did not supply replay_artifacts and no target_id buffer was consumed. Because that array is initialized as all-null, re-recording an existing skill in this common path silently wipes previously stored replay artifacts instead of preserving them, so later oc_skill_replay calls regress to ARTIFACT_MISSING.

Useful? React with 👍 / 👎.


// Mark the skill used so the recall layer reflects the replay activity.
try {
await store.markUsed(skillId, Date.now(), true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist replay outcome timestamps after each replay

The new replay handler only calls store.markUsed(..., true) on full success and never writes lastReplayPassedAt/lastReplayFailedAt via recordReplayResult; all failure exits return before any store update. oc_skill_recall ranks by those replay timestamps (computeReplaySignal), so replay health never affects ordering anymore and failed skills cannot be demoted as intended.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ce739a3e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +193 to +195
const role = el.getAttribute('role') || el.tagName.toLowerCase();
if (role !== sel.role && !(sel.role === 'button' && ['button', 'input'].includes(el.tagName.toLowerCase())) && !(sel.role === 'link' && el.tagName.toLowerCase() === 'a')) return false;
const name = el.getAttribute('aria-label') || textOf(el);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat native textbox controls as role_name matches

resolveSelector advertises native textbox support in roleCandidates but the role check only has native fallbacks for button/link. For a role_name selector with role: "textbox", plain <input>/<textarea> elements are rejected because their computed role is input/textarea, so replay can fail with ARTIFACT_RESOLUTION_FAILED on unchanged forms whenever no non-role fallback selector exists.

Useful? React with 👍 / 👎.

Comment on lines +263 to +266
el = Array.from(document.querySelectorAll(`[role="${cssEscape(sel.role)}"],button,a[href],input,textarea`)).find((candidate) => {
const text = (candidate.textContent || '').replace(/\s+/g, ' ').trim();
const name = candidate.getAttribute('aria-label') || text;
return sel.name === '' || name === sel.name;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce sel.role when focusing role_name press targets

In the press dispatch path, the role_name branch queries [role=...] plus button/a/input/textarea and then filters only by accessible name, without checking that the candidate actually matches sel.role. If multiple controls share the same name, replay can focus a different element type than intended and send the key to the wrong target while still reporting success.

Useful? React with 👍 / 👎.

@shaun0927 shaun0927 merged commit e5fabbd into develop May 13, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b336fc7de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +122 to +123
const explicitArtifacts = args.replay_artifacts as Array<ReplayArtifact | null> | undefined;
const targetId = typeof args.target_id === 'string' ? (args.target_id as string) : undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Redact replay_artifacts before persisting skill records

oc_skill_record redacts steps and frozen_snapshot, but replay_artifacts is taken directly from user input and later stored unchanged, so sensitive literals in artifact args (for example fill values such as passwords/tokens) can be written to skills.json in plaintext. This is a new secret-leak path introduced with replay artifacts and bypasses the module’s documented redaction behavior for skill recording.

Useful? React with 👍 / 👎.

return {
content: [{ type: 'text', text: JSON.stringify({ error: message }) }],
isError: true,
const action = await dispatchStep(step, resolution.selector, page, stepTimeoutMs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce a single timeout budget across resolve and dispatch

The tool contract says step_timeout_ms covers both selector resolution and action dispatch, but the loop gives each phase its own full deadline: resolveStep gets stepTimeoutMs, then dispatchStep is called with the same full value again. A step can therefore run nearly 2 * step_timeout_ms (or more with overhead), so callers cannot rely on the configured per-step cap.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant