Skip to content

feat(hints): rank safe recovery candidates (#1018)#1088

Closed
shaun0927 wants to merge 12 commits into
developfrom
feat/1018-recovery-candidate-ranking
Closed

feat(hints): rank safe recovery candidates (#1018)#1088
shaun0927 wants to merge 12 commits into
developfrom
feat/1018-recovery-candidate-ranking

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/1018-recovery-candidate-rankingfeat/1019-recovery-reward-scorer
Draft no
CI
Mergeable ✅ MERGEABLE
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) chatgpt-codex-connector: commented
Head 1ba7470 — Resolve recovery candidate ranking base conflict
Commits 2

Owner comment cleanup: 2 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Summary\n- Adds advisory recovery candidate ranking for stuck/stalling progress hints.\n- Surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority.\n- Filters blocking/auth scenarios away from blind interaction retries.\n\nCloses #1018.\n\nStacked on #1078 because candidate scores reuse the deterministic recovery scorer.\n\n## Validation\n- npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts\n- npm run build\n\n## Live OpenChrome verification\n- Not run in this PR session; #1073 adds the deterministic stale-ref fixture for follow-up live transcript verification.\n\n## Safety\n- Advisory only: no recovery candidate is auto-executed.\n- Blocking/auth pages include a blocked blind-retry candidate and prefer read-only state checks.

Add advisory recovery candidate ranking so progress-tracker stuck/stalling hints provide concrete read-only next steps without auto-executing browser actions.\n\nConstraint: Candidate ranking must stay advisory and must not bypass existing hint priority or safety behavior.\nRejected: Automatic recovery execution in HintEngine | belongs in a later opt-in bounded recovery search lane.\nConfidence: high\nScope-risk: narrow\nDirective: Keep high-risk candidates blocked or advisory-only; do not turn hint candidates into tool execution without #1020-style gates.\nTested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts; npm run build\nNot-tested: Full live stale-ref/browser hint transcript.
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19b84a5818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recovery/candidate-ranker.ts Outdated
Comment on lines +70 to +75
add({
tool: input.toolName,
reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.',
risk: 'side_effect_possible',
baseScore: -0.2,
blockedReason: 'blocking/auth signal present',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate blocked retries to blind interaction tools

The blocking/auth branch always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find or tabs_context). In those cases, the metadata incorrectly reports a safe observation tool as "unsafe," and that blocked entry can displace more useful candidates under maxCandidates. This path is triggered whenever blocking text is present, so it should only add the blocked-retry candidate when input.toolName is actually in the blind-interaction set.

Useful? React with 👍 / 👎.

Comment thread src/recovery/candidate-ranker.ts Outdated
resultText: input.resultText,
freshRefsDiscovered: candidate.tool === 'read_page' && isStaleOrElementFailure(text),
observationOnly: READ_TOOLS.has(candidate.tool),
repeatedFailureCount: repeatedToolCount(input.recentCalls, input.toolName),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use failure-only count for repeatedFailureCount

repeatedFailureCount is populated with repeatedToolCount(...), which counts every prior invocation of the tool (including successes). scoreRecoveryOutcome treats this field as failure count and adds a failure-repeat penalty, so a single error after many successful calls is over-penalized and can skew candidate ordering. This should be derived from recent calls with failing results only.

Useful? React with 👍 / 👎.

Add bounded read-only recovery attempts for compiled plans so safe state-refresh candidates can repair stale or blocked steps under explicit opt-in budgets.\n\nConstraint: Live browser recovery must be opt-in, budgeted, and limited to read-only candidates.\nRejected: General MCTS/live browser branching | unsafe for authenticated sessions and unnecessary for the current harness goal.\nConfidence: medium\nScope-risk: moderate\nDirective: Do not widen candidate execution beyond read-only tools without an explicit destructive-action gate and live verification.\nTested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts; npm run build\nNot-tested: Full execute_plan MCP transcript against Chrome fixture.
Add a bounded recovery policy learner so repeated evidence-backed recoveries can bias future candidate ranking without bypassing safety gates or executing actions.\n\nConstraint: Learned recovery policy must be advisory only and must not store sensitive page content or credentials.\nRejected: A separate autonomous recovery memory/executor | duplicates PatternLearner direction and risks unsafe automatic action.\nConfidence: high\nScope-risk: narrow\nDirective: Learned policies may rank candidates but must never override risk gates or opt-in execution controls.\nTested: npm test -- --runTestsByPath tests/recovery/policy-learner.test.ts tests/recovery/candidate-ranker.test.ts; npm run build\nNot-tested: Two-run live OpenChrome policy promotion transcript.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19b84a5818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recovery/candidate-ranker.ts Outdated
Comment on lines +70 to +74
add({
tool: input.toolName,
reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.',
risk: 'side_effect_possible',
baseScore: -0.2,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit blocking-page retry suppression to blind tools

In the blocking/auth branch this always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find/tabs_context). That marks safe diagnostic retries as side_effect_possible with blockedReason, which conflicts with the intended “filter blind retries” behavior and can mislead agents away from valid low-risk recovery steps whenever the stuck/stalling state is reached on a non-interaction tool. Gate this blocked candidate behind BLIND_INTERACTION_TOOLS.has(input.toolName) (or adjust risk/reason by tool class) so only true blind interactions are suppressed.

Useful? React with 👍 / 👎.

Preserve the candidate-ranker exports while accepting the reward-scorer index exports already present in the updated base branch.

Constraint: PR #1088 was merge-conflicting against feat/1019-recovery-reward-scorer.

Rejected: Re-exporting reward-scorer symbols twice | duplicate exports add noise and risk lint/type conflicts.

Confidence: high

Scope-risk: narrow

Directive: Keep src/recovery/index.ts as the single barrel for reward scoring and candidate ranking.

Tested: npm test -- --runInBand tests/recovery/candidate-ranker.test.ts tests/hints/recovery-candidates.test.ts

Tested: npm run build && npm run lint:changed && npm run lint:tier && npm run lint:tool-schemas

Not-tested: Full repository test suite.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

shaun0927 added 3 commits May 13, 2026 19:28
…arch

feat(orchestration): opt-in bounded recovery search (#1020)
…rning

feat(memory): learn evidence-backed recovery policies (#1022)
@shaun0927
Copy link
Copy Markdown
Owner Author

Merge rationale (stack consolidation)

Intent. Closes #1018 — adds advisory recovery candidate ranking for stuck/stalling progress hints, surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority, and filters blocking/auth scenarios away from blind interaction retries.

Why this is correct.

  • Advisory ranking: existing rule priority is preserved, ranks only inform hint text/metadata. No new authority over execution.
  • Safety filter: blocking/auth scenarios are explicitly excluded from blind interaction retries — prevents "helpful" recovery from spamming an auth wall.
  • Stacked on feat(recovery): score trajectory outcomes deterministically (#1019) #1078 (deterministic recovery scorer) — candidate scores reuse the scorer, not duplicate it.
  • Validation: tests/recovery/candidate-ranker.test.ts + tests/hints/hint-engine.test.ts + npm run build.
  • Scope contained: 5 files, +249/-2. No Codex P0/P1/P2 outstanding.

CI. Targets the recovery-reward-scorer feature branch; CI workflow only runs on main/develop PRs.

@shaun0927
Copy link
Copy Markdown
Owner Author

Hit merge conflicts after upstream stack changes. Recommend rebase to absorb the latest scorer changes:

git checkout feat/1018-recovery-candidate-ranking
git fetch origin && git rebase origin/feat/1019-recovery-reward-scorer
git push --force-with-lease

Content review already cleared this PR (see the merge rationale comment above).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee60444baa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recovery/index.ts Outdated
Comment on lines +21 to +22
export { scoreRecoveryOutcome, scoreFromToolResult } from './reward-scorer';
export type { RecoveryRewardClassification, RecoveryRewardInput, RecoveryRewardScore } from './reward-scorer';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove duplicate reward-scorer re-exports in recovery index

src/recovery/index.ts now exports scoreRecoveryOutcome, scoreFromToolResult, and their types twice (first in the existing block and again in the new block), which creates duplicate export names in the same module. This can fail module parsing/type-checking (and in ESM emits a duplicate-export syntax error) as soon as anything imports the recovery barrel, so this should be deduplicated to a single re-export block.

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Owner Author

Attempted programmatic rebase + merge of feat/1019-recovery-reward-scorer into this branch resulted in cascading conflicts across src/recovery/index.ts, src/types/plan-cache.ts, and src/orchestration/plan-executor.ts — the scorer base has moved significantly since this PR was opened (the trajectory-ledger + plan-executor cross-cutting changes overlap with this PR's candidate-ranker integration). Recommend manual rebase by the original author since the resolution needs expert judgment about which side wins on each conflict. Content review pass already cleared this PR (see the merge rationale comment above).

Constraint: PR #1088 review found read-only tools were incorrectly blocked and repeat-failure scoring counted prior successes.
Rejected: Treating all blocking-page retries as unsafe | Only blind interaction retries have side-effect risk.
Confidence: high
Scope-risk: narrow
Directive: Candidate ranking must distinguish safe observation from blind mutation and keep barrel exports unique.
Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build
Not-tested: Full CI matrix pending on GitHub
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 722ed1a74b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/orchestration/plan-executor.ts Outdated
}

function countExecutedRecoveryAttempts(attempts: PlanRecoveryAttempt[]): number {
return attempts.filter((attempt) => attempt.status !== 'blocked').length;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude non-executed recovery attempts from tool-call budget

countExecutedRecoveryAttempts treats every non-blocked attempt as a consumed tool call, but tryBoundedRecovery records status: 'failed' for tool handler not found before any tool invocation. In that case, bounded recovery can exhaust maxToolCalls and stop before trying later viable candidates, even though no recovery tool actually ran.

Useful? React with 👍 / 👎.

Comment thread src/orchestration/plan-executor.ts Outdated
switch (tool) {
case 'read_page':
case 'tabs_context':
return tabId ? { tabId } : {};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block read_page recovery when tabId is unavailable

buildSafeRecoveryArgs returns {} for read_page/tabs_context when params.tabId is missing, so bounded recovery proceeds with an argument set that is invalid for read_page (its tool schema requires tabId in src/tools/read-page.ts). This turns an unsatisfied precondition into a guaranteed failing call that consumes recovery budget instead of marking the candidate as blocked.

Useful? React with 👍 / 👎.

Constraint: Preserve the existing PlanExecutor compatibility path where non-error empty MCP payloads are parsed and judged by success criteria unless an explicit empty-result handler recovers first.
Rejected: Treating empty payloads as implicit failures | that would change legacy plan execution behavior and bypass plan-defined success criteria.
Confidence: high
Scope-risk: narrow
Directive: Do not add silent recovery bypasses; route empty-result policy through explicit handlers or successCriteria.
Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build
Not-tested: full CI matrix pending stacked PR base merge

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b32f471181

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/orchestration/plan-executor.ts Outdated
Comment on lines +377 to +379
if (!result.isError && !isEmptyResult(result)) {
recoveryAttempts.push({ tool: candidate.tool, status: 'success', reason: candidate.reason });
return { recovered: true, stepsExecuted: executed };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry failed step before declaring bounded recovery success

Any non-empty response from a read-only recovery candidate is treated as full recovery (recovered: true), and the executor then skips the original failed step. In plans where that failed step was supposed to produce side effects or populate values consumed by later ${...} substitutions, execution can continue with missing state and still report success if downstream criteria happen to pass. This is triggered whenever a step fails, read_page/tabs_context returns non-empty output, and bounded recovery is enabled.

Useful? React with 👍 / 👎.

shaun0927 and others added 3 commits May 14, 2026 03:22
Constraint: A read-only recovery probe can refresh evidence but does not prove the originally failed step succeeded.
Rejected: Skipping the failed step after a successful recovery probe | it can report plan success without executing the requested action.
Confidence: high
Scope-risk: moderate
Directive: Treat bounded recovery probes as evidence until the failed action is explicitly retried and passes.
Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build
Not-tested: full CI matrix pending stacked base merge

Co-authored-by: OmX <omx@oh-my-codex.dev>
Constraint: The bounded-recovery safety change should remain reviewable without whole-file line-ending churn.
Rejected: Leaving the prior LF-normalized diff | it obscures the actual recovery semantic change.
Confidence: high
Scope-risk: narrow
Directive: Preserve existing file line endings when making targeted PR review fixes.
Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build
Not-tested: full CI matrix pending stacked base merge

Co-authored-by: OmX <omx@oh-my-codex.dev>
Merged the stacked recovery candidate-ranking branch onto develop after the recovery scorer base landed, preserving advisory candidate ranking, bounded read-only recovery, policy learning, and current plan verification contracts.

Constraint: PR #1088 was stacked on the now-merged #1078 branch and had to be retargeted to develop with conflicts resolved.
Rejected: Dropping bounded recovery or current plan-verification evidence | either choice would regress already-reviewed stack behavior.
Confidence: high
Scope-risk: broad
Directive: Keep recovery candidate ranking advisory and keep bounded recovery limited to safe read-only candidates.
Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/policy-learner.test.ts tests/hints/hint-engine.test.ts tests/orchestration/plan-executor-bounded-recovery.test.ts tests/orchestration/plan-cache.test.ts tests/run-harness/tools-budget.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; npm run lint:tool-schemas; node scripts/lint-tools-capabilities.js; git diff --check
Not-tested: Full repository test suite and live browser recovery campaign.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927 shaun0927 changed the base branch from feat/1019-recovery-reward-scorer to develop May 13, 2026 20:40
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

1 similar comment
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

@shaun0927
Copy link
Copy Markdown
Owner Author

Closed as implemented by merged PR #1108 (feat(orchestration): opt-in bounded recovery search (#1020)). Verification evidence: PR #1108 is merged into develop.

@shaun0927 shaun0927 closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant