feat(hints): rank safe recovery candidates (#1018) by shaun0927 · Pull Request #1088 · shaun0927/openchrome

shaun0927 · 2026-05-12T15:18:36Z

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field	Value
Branch	`feat/1018-recovery-candidate-ranking` → `feat/1019-recovery-reward-scorer`
Draft	no
CI	—
Mergeable	✅ MERGEABLE
Review decision	—
Codex (latest)	💡 suggestions posted
Other reviewers (latest)	chatgpt-codex-connector: commented
Head	`1ba7470` — Resolve recovery candidate ranking base conflict
Commits	2

_{Owner comment cleanup: 2 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.}

Summary\n- Adds advisory recovery candidate ranking for stuck/stalling progress hints.\n- Surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority.\n- Filters blocking/auth scenarios away from blind interaction retries.\n\nCloses #1018.\n\nStacked on #1078 because candidate scores reuse the deterministic recovery scorer.\n\n## Validation\n- npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts\n- npm run build\n\n## Live OpenChrome verification\n- Not run in this PR session; #1073 adds the deterministic stale-ref fixture for follow-up live transcript verification.\n\n## Safety\n- Advisory only: no recovery candidate is auto-executed.\n- Blocking/auth pages include a blocked blind-retry candidate and prefer read-only state checks.

Add advisory recovery candidate ranking so progress-tracker stuck/stalling hints provide concrete read-only next steps without auto-executing browser actions.\n\nConstraint: Candidate ranking must stay advisory and must not bypass existing hint priority or safety behavior.\nRejected: Automatic recovery execution in HintEngine | belongs in a later opt-in bounded recovery search lane.\nConfidence: high\nScope-risk: narrow\nDirective: Keep high-risk candidates blocked or advisory-only; do not turn hint candidates into tool execution without #1020-style gates.\nTested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts; npm run build\nNot-tested: Full live stale-ref/browser hint transcript.

gemini-code-assist · 2026-05-12T15:19:10Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

qodo-code-review · 2026-05-12T15:19:11Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19b84a5818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T15:24:43Z

+    add({
+      tool: input.toolName,
+      reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.',
+      risk: 'side_effect_possible',
+      baseScore: -0.2,
+      blockedReason: 'blocking/auth signal present',


Gate blocked retries to blind interaction tools

The blocking/auth branch always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find or tabs_context). In those cases, the metadata incorrectly reports a safe observation tool as "unsafe," and that blocked entry can displace more useful candidates under maxCandidates. This path is triggered whenever blocking text is present, so it should only add the blocked-retry candidate when input.toolName is actually in the blind-interaction set.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-12T15:24:43Z

+      resultText: input.resultText,
+      freshRefsDiscovered: candidate.tool === 'read_page' && isStaleOrElementFailure(text),
+      observationOnly: READ_TOOLS.has(candidate.tool),
+      repeatedFailureCount: repeatedToolCount(input.recentCalls, input.toolName),


Use failure-only count for repeatedFailureCount

repeatedFailureCount is populated with repeatedToolCount(...), which counts every prior invocation of the tool (including successes). scoreRecoveryOutcome treats this field as failure count and adds a failure-repeat penalty, so a single error after many successful calls is over-penalized and can skew candidate ordering. This should be derived from recent calls with failing results only.

Useful? React with 👍 / 👎.

Add bounded read-only recovery attempts for compiled plans so safe state-refresh candidates can repair stale or blocked steps under explicit opt-in budgets.\n\nConstraint: Live browser recovery must be opt-in, budgeted, and limited to read-only candidates.\nRejected: General MCTS/live browser branching | unsafe for authenticated sessions and unnecessary for the current harness goal.\nConfidence: medium\nScope-risk: moderate\nDirective: Do not widen candidate execution beyond read-only tools without an explicit destructive-action gate and live verification.\nTested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts; npm run build\nNot-tested: Full execute_plan MCP transcript against Chrome fixture.

Add a bounded recovery policy learner so repeated evidence-backed recoveries can bias future candidate ranking without bypassing safety gates or executing actions.\n\nConstraint: Learned recovery policy must be advisory only and must not store sensitive page content or credentials.\nRejected: A separate autonomous recovery memory/executor | duplicates PatternLearner direction and risks unsafe automatic action.\nConfidence: high\nScope-risk: narrow\nDirective: Learned policies may rank candidates but must never override risk gates or opt-in execution controls.\nTested: npm test -- --runTestsByPath tests/recovery/policy-learner.test.ts tests/recovery/candidate-ranker.test.ts; npm run build\nNot-tested: Two-run live OpenChrome policy promotion transcript.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19b84a5818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T00:49:41Z

+    add({
+      tool: input.toolName,
+      reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.',
+      risk: 'side_effect_possible',
+      baseScore: -0.2,


Limit blocking-page retry suppression to blind tools

In the blocking/auth branch this always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find/tabs_context). That marks safe diagnostic retries as side_effect_possible with blockedReason, which conflicts with the intended “filter blind retries” behavior and can mislead agents away from valid low-risk recovery steps whenever the stuck/stalling state is reached on a non-interaction tool. Gate this blocked candidate behind BLIND_INTERACTION_TOOLS.has(input.toolName) (or adjust risk/reason by tool class) so only true blind interactions are suppressed.

Useful? React with 👍 / 👎.

Preserve the candidate-ranker exports while accepting the reward-scorer index exports already present in the updated base branch. Constraint: PR #1088 was merge-conflicting against feat/1019-recovery-reward-scorer. Rejected: Re-exporting reward-scorer symbols twice | duplicate exports add noise and risk lint/type conflicts. Confidence: high Scope-risk: narrow Directive: Keep src/recovery/index.ts as the single barrel for reward scoring and candidate ranking. Tested: npm test -- --runInBand tests/recovery/candidate-ranker.test.ts tests/hints/recovery-candidates.test.ts Tested: npm run build && npm run lint:changed && npm run lint:tier && npm run lint:tool-schemas Not-tested: Full repository test suite. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T01:27:44Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

…licy-learning

…arch feat(orchestration): opt-in bounded recovery search (#1020)

…rning feat(memory): learn evidence-backed recovery policies (#1022)

shaun0927 · 2026-05-13T15:13:27Z

Merge rationale (stack consolidation)

Intent. Closes #1018 — adds advisory recovery candidate ranking for stuck/stalling progress hints, surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority, and filters blocking/auth scenarios away from blind interaction retries.

Why this is correct.

Advisory ranking: existing rule priority is preserved, ranks only inform hint text/metadata. No new authority over execution.
Safety filter: blocking/auth scenarios are explicitly excluded from blind interaction retries — prevents "helpful" recovery from spamming an auth wall.
Stacked on feat(recovery): score trajectory outcomes deterministically (#1019) #1078 (deterministic recovery scorer) — candidate scores reuse the scorer, not duplicate it.
Validation: tests/recovery/candidate-ranker.test.ts + tests/hints/hint-engine.test.ts + npm run build.
Scope contained: 5 files, +249/-2. No Codex P0/P1/P2 outstanding.

CI. Targets the recovery-reward-scorer feature branch; CI workflow only runs on main/develop PRs.

shaun0927 · 2026-05-13T15:13:45Z

Hit merge conflicts after upstream stack changes. Recommend rebase to absorb the latest scorer changes:

git checkout feat/1018-recovery-candidate-ranking
git fetch origin && git rebase origin/feat/1019-recovery-reward-scorer
git push --force-with-lease

Content review already cleared this PR (see the merge rationale comment above).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee60444baa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T15:16:04Z

+export { scoreRecoveryOutcome, scoreFromToolResult } from './reward-scorer';
+export type { RecoveryRewardClassification, RecoveryRewardInput, RecoveryRewardScore } from './reward-scorer';


Remove duplicate reward-scorer re-exports in recovery index

src/recovery/index.ts now exports scoreRecoveryOutcome, scoreFromToolResult, and their types twice (first in the existing block and again in the new block), which creates duplicate export names in the same module. This can fail module parsing/type-checking (and in ESM emits a duplicate-export syntax error) as soon as anything imports the recovery barrel, so this should be deduplicated to a single re-export block.

Useful? React with 👍 / 👎.

shaun0927 · 2026-05-13T15:18:58Z

Attempted programmatic rebase + merge of feat/1019-recovery-reward-scorer into this branch resulted in cascading conflicts across src/recovery/index.ts, src/types/plan-cache.ts, and src/orchestration/plan-executor.ts — the scorer base has moved significantly since this PR was opened (the trajectory-ledger + plan-executor cross-cutting changes overlap with this PR's candidate-ranker integration). Recommend manual rebase by the original author since the resolution needs expert judgment about which side wins on each conflict. Content review pass already cleared this PR (see the merge rationale comment above).

Constraint: PR #1088 review found read-only tools were incorrectly blocked and repeat-failure scoring counted prior successes. Rejected: Treating all blocking-page retries as unsafe | Only blind interaction retries have side-effect risk. Confidence: high Scope-risk: narrow Directive: Candidate ranking must distinguish safe observation from blind mutation and keep barrel exports unique. Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: Full CI matrix pending on GitHub

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 722ed1a74b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T17:25:10Z

+}
+
+function countExecutedRecoveryAttempts(attempts: PlanRecoveryAttempt[]): number {
+  return attempts.filter((attempt) => attempt.status !== 'blocked').length;


Exclude non-executed recovery attempts from tool-call budget

countExecutedRecoveryAttempts treats every non-blocked attempt as a consumed tool call, but tryBoundedRecovery records status: 'failed' for tool handler not found before any tool invocation. In that case, bounded recovery can exhaust maxToolCalls and stop before trying later viable candidates, even though no recovery tool actually ran.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T17:25:10Z

+  switch (tool) {
+    case 'read_page':
+    case 'tabs_context':
+      return tabId ? { tabId } : {};


Block read_page recovery when tabId is unavailable

buildSafeRecoveryArgs returns {} for read_page/tabs_context when params.tabId is missing, so bounded recovery proceeds with an argument set that is invalid for read_page (its tool schema requires tabId in src/tools/read-page.ts). This turns an unsatisfied precondition into a guaranteed failing call that consumes recovery budget instead of marking the candidate as blocked.

Useful? React with 👍 / 👎.

Constraint: Preserve the existing PlanExecutor compatibility path where non-error empty MCP payloads are parsed and judged by success criteria unless an explicit empty-result handler recovers first. Rejected: Treating empty payloads as implicit failures | that would change legacy plan execution behavior and bypass plan-defined success criteria. Confidence: high Scope-risk: narrow Directive: Do not add silent recovery bypasses; route empty-result policy through explicit handlers or successCriteria. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked PR base merge Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b32f471181

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T18:12:32Z

+        if (!result.isError && !isEmptyResult(result)) {
+          recoveryAttempts.push({ tool: candidate.tool, status: 'success', reason: candidate.reason });
+          return { recovered: true, stepsExecuted: executed };


Retry failed step before declaring bounded recovery success

Any non-empty response from a read-only recovery candidate is treated as full recovery (recovered: true), and the executor then skips the original failed step. In plans where that failed step was supposed to produce side effects or populate values consumed by later ${...} substitutions, execution can continue with missing state and still report success if downstream criteria happen to pass. This is triggered whenever a step fails, read_page/tabs_context returns non-empty output, and bounded recovery is enabled.

Useful? React with 👍 / 👎.

Constraint: A read-only recovery probe can refresh evidence but does not prove the originally failed step succeeded. Rejected: Skipping the failed step after a successful recovery probe | it can report plan success without executing the requested action. Confidence: high Scope-risk: moderate Directive: Treat bounded recovery probes as evidence until the failed action is explicitly retried and passes. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked base merge Co-authored-by: OmX <omx@oh-my-codex.dev>

Constraint: The bounded-recovery safety change should remain reviewable without whole-file line-ending churn. Rejected: Leaving the prior LF-normalized diff | it obscures the actual recovery semantic change. Confidence: high Scope-risk: narrow Directive: Preserve existing file line endings when making targeted PR review fixes. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked base merge Co-authored-by: OmX <omx@oh-my-codex.dev>

Merged the stacked recovery candidate-ranking branch onto develop after the recovery scorer base landed, preserving advisory candidate ranking, bounded read-only recovery, policy learning, and current plan verification contracts. Constraint: PR #1088 was stacked on the now-merged #1078 branch and had to be retargeted to develop with conflicts resolved. Rejected: Dropping bounded recovery or current plan-verification evidence | either choice would regress already-reviewed stack behavior. Confidence: high Scope-risk: broad Directive: Keep recovery candidate ranking advisory and keep bounded recovery limited to safe read-only candidates. Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/policy-learner.test.ts tests/hints/hint-engine.test.ts tests/orchestration/plan-executor-bounded-recovery.test.ts tests/orchestration/plan-cache.test.ts tests/run-harness/tools-budget.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; npm run lint:tool-schemas; node scripts/lint-tools-capabilities.js; git diff --check Not-tested: Full repository test suite and live browser recovery campaign. Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector · 2026-05-13T20:40:20Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

chatgpt-codex-connector · 2026-05-13T20:40:22Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

shaun0927 · 2026-05-13T21:13:47Z

Closed as implemented by merged PR #1108 (feat(orchestration): opt-in bounded recovery search (#1020)). Verification evidence: PR #1108 is merged into develop.

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

shaun0927 mentioned this pull request May 12, 2026

feat(orchestration): opt-in bounded recovery search (#1020) #1108

Merged

shaun0927 mentioned this pull request May 12, 2026

feat(memory): learn evidence-backed recovery policies (#1022) #1109

Merged

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 mentioned this pull request May 13, 2026

feat(hints): rank evidence-scored recovery candidates in stuck hints #1018

Closed

9 tasks

shaun0927 added 3 commits May 13, 2026 19:28

Merge feat/1018-recovery-candidate-ranking into feat/1022-recovery-po…

dafd548

…licy-learning

Merge pull request #1108 from shaun0927/feat/1020-bounded-recovery-se…

775c9ee

…arch feat(orchestration): opt-in bounded recovery search (#1020)

Merge pull request #1109 from shaun0927/feat/1022-recovery-policy-lea…

ee60444

…rning feat(memory): learn evidence-backed recovery policies (#1022)

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

shaun0927 and others added 3 commits May 14, 2026 03:22

shaun0927 changed the base branch from feat/1019-recovery-reward-scorer to develop May 13, 2026 20:40

shaun0927 closed this May 13, 2026

		export { scoreRecoveryOutcome, scoreFromToolResult } from './reward-scorer';
		export type { RecoveryRewardClassification, RecoveryRewardInput, RecoveryRewardScore } from './reward-scorer';

Conversation

shaun0927 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress / Review status

Uh oh!

gemini-code-assist Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

shaun0927 commented May 13, 2026

Merge rationale (stack consolidation)

Uh oh!

shaun0927 commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 13, 2026

Uh oh!

shaun0927 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shaun0927 commented May 12, 2026 •

edited

Loading