feat(hints): rank safe recovery candidates (#1018)#1088
Conversation
Add advisory recovery candidate ranking so progress-tracker stuck/stalling hints provide concrete read-only next steps without auto-executing browser actions.\n\nConstraint: Candidate ranking must stay advisory and must not bypass existing hint priority or safety behavior.\nRejected: Automatic recovery execution in HintEngine | belongs in a later opt-in bounded recovery search lane.\nConfidence: high\nScope-risk: narrow\nDirective: Keep high-risk candidates blocked or advisory-only; do not turn hint candidates into tool execution without #1020-style gates.\nTested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts; npm run build\nNot-tested: Full live stale-ref/browser hint transcript.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19b84a5818
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| add({ | ||
| tool: input.toolName, | ||
| reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.', | ||
| risk: 'side_effect_possible', | ||
| baseScore: -0.2, | ||
| blockedReason: 'blocking/auth signal present', |
There was a problem hiding this comment.
Gate blocked retries to blind interaction tools
The blocking/auth branch always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find or tabs_context). In those cases, the metadata incorrectly reports a safe observation tool as "unsafe," and that blocked entry can displace more useful candidates under maxCandidates. This path is triggered whenever blocking text is present, so it should only add the blocked-retry candidate when input.toolName is actually in the blind-interaction set.
Useful? React with 👍 / 👎.
| resultText: input.resultText, | ||
| freshRefsDiscovered: candidate.tool === 'read_page' && isStaleOrElementFailure(text), | ||
| observationOnly: READ_TOOLS.has(candidate.tool), | ||
| repeatedFailureCount: repeatedToolCount(input.recentCalls, input.toolName), |
There was a problem hiding this comment.
Use failure-only count for repeatedFailureCount
repeatedFailureCount is populated with repeatedToolCount(...), which counts every prior invocation of the tool (including successes). scoreRecoveryOutcome treats this field as failure count and adds a failure-repeat penalty, so a single error after many successful calls is over-penalized and can skew candidate ordering. This should be derived from recent calls with failing results only.
Useful? React with 👍 / 👎.
Add bounded read-only recovery attempts for compiled plans so safe state-refresh candidates can repair stale or blocked steps under explicit opt-in budgets.\n\nConstraint: Live browser recovery must be opt-in, budgeted, and limited to read-only candidates.\nRejected: General MCTS/live browser branching | unsafe for authenticated sessions and unnecessary for the current harness goal.\nConfidence: medium\nScope-risk: moderate\nDirective: Do not widen candidate execution beyond read-only tools without an explicit destructive-action gate and live verification.\nTested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts; npm run build\nNot-tested: Full execute_plan MCP transcript against Chrome fixture.
Add a bounded recovery policy learner so repeated evidence-backed recoveries can bias future candidate ranking without bypassing safety gates or executing actions.\n\nConstraint: Learned recovery policy must be advisory only and must not store sensitive page content or credentials.\nRejected: A separate autonomous recovery memory/executor | duplicates PatternLearner direction and risks unsafe automatic action.\nConfidence: high\nScope-risk: narrow\nDirective: Learned policies may rank candidates but must never override risk gates or opt-in execution controls.\nTested: npm test -- --runTestsByPath tests/recovery/policy-learner.test.ts tests/recovery/candidate-ranker.test.ts; npm run build\nNot-tested: Two-run live OpenChrome policy promotion transcript.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19b84a5818
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| add({ | ||
| tool: input.toolName, | ||
| reason: 'Blind retry is unsafe on auth, CAPTCHA, or blocking pages.', | ||
| risk: 'side_effect_possible', | ||
| baseScore: -0.2, |
There was a problem hiding this comment.
Limit blocking-page retry suppression to blind tools
In the blocking/auth branch this always adds a blocked retry candidate for input.toolName, even when the current tool is read-only (for example find/tabs_context). That marks safe diagnostic retries as side_effect_possible with blockedReason, which conflicts with the intended “filter blind retries” behavior and can mislead agents away from valid low-risk recovery steps whenever the stuck/stalling state is reached on a non-interaction tool. Gate this blocked candidate behind BLIND_INTERACTION_TOOLS.has(input.toolName) (or adjust risk/reason by tool class) so only true blind interactions are suppressed.
Useful? React with 👍 / 👎.
Preserve the candidate-ranker exports while accepting the reward-scorer index exports already present in the updated base branch. Constraint: PR #1088 was merge-conflicting against feat/1019-recovery-reward-scorer. Rejected: Re-exporting reward-scorer symbols twice | duplicate exports add noise and risk lint/type conflicts. Confidence: high Scope-risk: narrow Directive: Keep src/recovery/index.ts as the single barrel for reward scoring and candidate ranking. Tested: npm test -- --runInBand tests/recovery/candidate-ranker.test.ts tests/hints/recovery-candidates.test.ts Tested: npm run build && npm run lint:changed && npm run lint:tier && npm run lint:tool-schemas Not-tested: Full repository test suite. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Merge rationale (stack consolidation)Intent. Closes #1018 — adds advisory recovery candidate ranking for stuck/stalling progress hints, surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority, and filters blocking/auth scenarios away from blind interaction retries. Why this is correct.
CI. Targets the recovery-reward-scorer feature branch; CI workflow only runs on |
|
Hit merge conflicts after upstream stack changes. Recommend rebase to absorb the latest scorer changes: git checkout feat/1018-recovery-candidate-ranking
git fetch origin && git rebase origin/feat/1019-recovery-reward-scorer
git push --force-with-leaseContent review already cleared this PR (see the merge rationale comment above). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee60444baa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export { scoreRecoveryOutcome, scoreFromToolResult } from './reward-scorer'; | ||
| export type { RecoveryRewardClassification, RecoveryRewardInput, RecoveryRewardScore } from './reward-scorer'; |
There was a problem hiding this comment.
Remove duplicate reward-scorer re-exports in recovery index
src/recovery/index.ts now exports scoreRecoveryOutcome, scoreFromToolResult, and their types twice (first in the existing block and again in the new block), which creates duplicate export names in the same module. This can fail module parsing/type-checking (and in ESM emits a duplicate-export syntax error) as soon as anything imports the recovery barrel, so this should be deduplicated to a single re-export block.
Useful? React with 👍 / 👎.
|
Attempted programmatic rebase + merge of |
Constraint: PR #1088 review found read-only tools were incorrectly blocked and repeat-failure scoring counted prior successes. Rejected: Treating all blocking-page retries as unsafe | Only blind interaction retries have side-effect risk. Confidence: high Scope-risk: narrow Directive: Candidate ranking must distinguish safe observation from blind mutation and keep barrel exports unique. Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: Full CI matrix pending on GitHub
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 722ed1a74b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| function countExecutedRecoveryAttempts(attempts: PlanRecoveryAttempt[]): number { | ||
| return attempts.filter((attempt) => attempt.status !== 'blocked').length; |
There was a problem hiding this comment.
Exclude non-executed recovery attempts from tool-call budget
countExecutedRecoveryAttempts treats every non-blocked attempt as a consumed tool call, but tryBoundedRecovery records status: 'failed' for tool handler not found before any tool invocation. In that case, bounded recovery can exhaust maxToolCalls and stop before trying later viable candidates, even though no recovery tool actually ran.
Useful? React with 👍 / 👎.
| switch (tool) { | ||
| case 'read_page': | ||
| case 'tabs_context': | ||
| return tabId ? { tabId } : {}; |
There was a problem hiding this comment.
Block read_page recovery when tabId is unavailable
buildSafeRecoveryArgs returns {} for read_page/tabs_context when params.tabId is missing, so bounded recovery proceeds with an argument set that is invalid for read_page (its tool schema requires tabId in src/tools/read-page.ts). This turns an unsatisfied precondition into a guaranteed failing call that consumes recovery budget instead of marking the candidate as blocked.
Useful? React with 👍 / 👎.
Constraint: Preserve the existing PlanExecutor compatibility path where non-error empty MCP payloads are parsed and judged by success criteria unless an explicit empty-result handler recovers first. Rejected: Treating empty payloads as implicit failures | that would change legacy plan execution behavior and bypass plan-defined success criteria. Confidence: high Scope-risk: narrow Directive: Do not add silent recovery bypasses; route empty-result policy through explicit handlers or successCriteria. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked PR base merge Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b32f471181
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!result.isError && !isEmptyResult(result)) { | ||
| recoveryAttempts.push({ tool: candidate.tool, status: 'success', reason: candidate.reason }); | ||
| return { recovered: true, stepsExecuted: executed }; |
There was a problem hiding this comment.
Retry failed step before declaring bounded recovery success
Any non-empty response from a read-only recovery candidate is treated as full recovery (recovered: true), and the executor then skips the original failed step. In plans where that failed step was supposed to produce side effects or populate values consumed by later ${...} substitutions, execution can continue with missing state and still report success if downstream criteria happen to pass. This is triggered whenever a step fails, read_page/tabs_context returns non-empty output, and bounded recovery is enabled.
Useful? React with 👍 / 👎.
Constraint: A read-only recovery probe can refresh evidence but does not prove the originally failed step succeeded. Rejected: Skipping the failed step after a successful recovery probe | it can report plan success without executing the requested action. Confidence: high Scope-risk: moderate Directive: Treat bounded recovery probes as evidence until the failed action is explicitly retried and passes. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked base merge Co-authored-by: OmX <omx@oh-my-codex.dev>
Constraint: The bounded-recovery safety change should remain reviewable without whole-file line-ending churn. Rejected: Leaving the prior LF-normalized diff | it obscures the actual recovery semantic change. Confidence: high Scope-risk: narrow Directive: Preserve existing file line endings when making targeted PR review fixes. Tested: npm test -- --runTestsByPath tests/orchestration/plan-executor-bounded-recovery.test.ts tests/recovery/candidate-ranker.test.ts tests/recovery/reward-scorer.test.ts --runInBand; npm run build Not-tested: full CI matrix pending stacked base merge Co-authored-by: OmX <omx@oh-my-codex.dev>
Merged the stacked recovery candidate-ranking branch onto develop after the recovery scorer base landed, preserving advisory candidate ranking, bounded read-only recovery, policy learning, and current plan verification contracts. Constraint: PR #1088 was stacked on the now-merged #1078 branch and had to be retargeted to develop with conflicts resolved. Rejected: Dropping bounded recovery or current plan-verification evidence | either choice would regress already-reviewed stack behavior. Confidence: high Scope-risk: broad Directive: Keep recovery candidate ranking advisory and keep bounded recovery limited to safe read-only candidates. Tested: npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/recovery/policy-learner.test.ts tests/hints/hint-engine.test.ts tests/orchestration/plan-executor-bounded-recovery.test.ts tests/orchestration/plan-cache.test.ts tests/run-harness/tools-budget.test.ts --runInBand; npm run build; npm run lint -- --quiet; npm run lint:tier; npm run lint:tool-schemas; node scripts/lint-tools-capabilities.js; git diff --check Not-tested: Full repository test suite and live browser recovery campaign. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
1 similar comment
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
feat/1018-recovery-candidate-ranking→feat/1019-recovery-reward-scorer1ba7470— Resolve recovery candidate ranking base conflictOwner comment cleanup: 2 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary\n- Adds advisory recovery candidate ranking for stuck/stalling progress hints.\n- Surfaces ranked candidates in hint metadata and hint text while preserving existing rule priority.\n- Filters blocking/auth scenarios away from blind interaction retries.\n\nCloses #1018.\n\nStacked on #1078 because candidate scores reuse the deterministic recovery scorer.\n\n## Validation\n- npm test -- --runTestsByPath tests/recovery/candidate-ranker.test.ts tests/hints/hint-engine.test.ts\n- npm run build\n\n## Live OpenChrome verification\n- Not run in this PR session; #1073 adds the deterministic stale-ref fixture for follow-up live transcript verification.\n\n## Safety\n- Advisory only: no recovery candidate is auto-executed.\n- Blocking/auth pages include a blocked blind-retry candidate and prefer read-only state checks.