feat(recovery): record passive trajectory ledger (#1017)#1073
Conversation
⚙️ superseded / quota notice (minimized)
|
⚙️ superseded / quota notice (minimized)ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06782d8e10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const recovered = | ||
| resultStatus === 'success' && | ||
| previous !== undefined && | ||
| previous.result === 'error' && |
There was a problem hiding this comment.
Detect recovery after no_progress attempts
The recovery classifier only treats a previous call as recoverable when previous.result === 'error', but many failed MCP tool attempts are recorded as resultStatus: 'no_progress' while the activity tracker result stays 'success' (see the success-path call to recordRecoveryTrajectory(..., result.isError ? 'no_progress' : 'success', ...)). In that common path, a follow-up successful tool will never be labeled recovered, so the new ledger undercounts recoveries and misclassifies trajectories. This is observable whenever a tool returns isError: true instead of throwing, then a different tool resolves the issue.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
796924a to
27ea971
Compare
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27ea971a98
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fs.mkdirSync(this.dirPath, { recursive: true }); | ||
| const node = this.buildNode(input); | ||
| const serialized = this.serializeBounded(node); | ||
| fs.appendFileSync(this.filePath, serialized + '\n', 'utf8'); |
There was a problem hiding this comment.
Avoid blocking tool calls with synchronous ledger I/O
record() performs mkdirSync, appendFileSync, and later sync reads/writes (enforceBounds/readRecent), and this method is invoked for every tool completion from handleToolsCall (success and error paths). Node’s fs docs explicitly note that synchronous APIs block the event loop, so this telemetry path can add request latency and stall concurrent sessions under load instead of staying passive as intended (https://nodejs.org/api/fs.html#synchronous-api).
Useful? React with 👍 / 👎.
| const node = this.buildNode(input); | ||
| const serialized = this.serializeBounded(node); | ||
| fs.appendFileSync(this.filePath, serialized + '\n', 'utf8'); | ||
| this.lastNodeBySession.set(input.sessionId, node.nodeId); |
There was a problem hiding this comment.
Bound session index growth in recovery ledger
lastNodeBySession is written on every record but never pruned, so a long-lived server that sees many distinct sessionId values will keep growing this map indefinitely even though on-disk nodes are bounded. This introduces a process-level memory leak over time for high-churn or multi-tenant workloads.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e9af6bc2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const recent = this.activityTracker.getRecentCalls(3, sessionId); | ||
| const current = recent.find((call) => call.id === callId); | ||
| const previous = recent.find((call) => call.id !== callId); | ||
| const previousTrajectory = this.recoveryLedger | ||
| .readRecent(1, sessionId) |
There was a problem hiding this comment.
Restrict recovery detection to the same tab context
Recovery classification is currently session-wide, so interleaved multi-tab activity can be mislabeled as a recovery: a failed call on tab A followed by a successful different tool on tab B will satisfy previousFailed/previous.toolName !== toolName and be recorded as recovered even though no recovery occurred for tab B. Because tabId is already available in args and persisted in nodes, the previous-call lookup should be scoped to matching tab context before marking recovery.
Useful? React with 👍 / 👎.
| toolName, | ||
| args: toolArgs, | ||
| resultStatus: recovered ? 'recovered' : resultStatus, | ||
| progressStatus: current?.result === 'error' ? 'stuck' : 'unknown', |
There was a problem hiding this comment.
Mark no_progress outcomes as stuck in progressStatus
progressStatus is derived from current?.result === 'error', but non-throwing tool failures (MCPResult.isError) are mapped to resultStatus: 'no_progress' while the activity tracker still records them as success, so these failed attempts are always stored as unknown instead of stuck. This loses the primary signal this ledger is meant to capture for stale-ref/no-progress flows.
Useful? React with 👍 / 👎.
5e9af6b to
b1402e3
Compare
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1402e3236
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const REDACTED = '[REDACTED]'; | ||
| const HASHED_PREFIX = 'sha256:'; | ||
|
|
||
| const SENSITIVE_KEY_RE = /(^|[_-])(password|passwd|pass|pwd|secret|token|api[_-]?key|authorization|auth|cookie|set-cookie|session|credential|private[_-]?key|otp|totp|pin)($|[_-])/i; |
There was a problem hiding this comment.
Redact camelCase secret keys in argument summaries
The sensitive-key pattern only matches when terms like token, session, or auth are standalone or separated by _/-, so common camelCase fields such as accessToken, sessionId, and authHeader are not treated as sensitive. In summarizeArgs, those values are then stored verbatim (unless longer than 200 chars), which can write live credentials into trajectory.jsonl and violates the ledger’s redaction guarantees.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2093ff063b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const REDACTED = '[REDACTED]'; | ||
| const HASHED_PREFIX = 'sha256:'; | ||
|
|
||
| const SENSITIVE_KEY_RE = /(^|[_-])(password|passwd|pass|pwd|secret|token|api[_-]?key|authorization|auth|cookie|set-cookie|session|credential|private[_-]?key|otp|totp|pin)($|[_-])/i; |
There was a problem hiding this comment.
Redact camelCase token/session fields in arg sanitizer
The sensitive-key regex only matches whole words delimited by start/end, _, or -, so common keys like accessToken, idToken, refreshToken, and sessionId are not treated as secrets and will be written in cleartext when their values are short. Since summarizeArgs is used for ledger persistence, this leaks auth material into .openchrome/recovery/trajectory.jsonl in normal tool-call payloads, which violates the ledger’s redaction guarantees.
Useful? React with 👍 / 👎.
2093ff0 to
3dd32a7
Compare
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3dd32a73a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| private buildNode(input: RecoveryTrajectoryNodeInput): RecoveryTrajectoryNode { | ||
| const parentNodeId = input.parentNodeId ?? this.lastNodeBySession.get(input.sessionId); |
There was a problem hiding this comment.
Derive parent node from the same tab context
buildNode falls back to lastNodeBySession, so in a session with multiple tabs the parentNodeId chain will hop across unrelated tabs whenever calls interleave. This corrupts trajectory topology (e.g., tab B nodes can become children of tab A failures) and makes downstream recovery analysis inaccurate even though tabId is tracked separately. Parent lookup should use the same session+tab context key used by getLastNode when no explicit parent is provided.
Useful? React with 👍 / 👎.
| private pruneSessionIndex(): void { | ||
| while (this.lastNodeBySession.size > this.maxSessionIndexEntries) { | ||
| const oldestSessionId = this.lastNodeBySession.keys().next().value; |
There was a problem hiding this comment.
Bound per-tab context index growth
lastNodeByContext is appended on every new sessionId+tabId pair, but pruning is only triggered by lastNodeBySession.size; a long-lived single session that churns through many tab IDs will never hit this condition. In that case the context map grows without bound, increasing memory usage and keeping stale context entries indefinitely. Add an explicit cap/prune path for lastNodeByContext (or per-session tab contexts) independent of session count.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eee5722f55
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| // End activity tracking (success) | ||
| this.activityTracker!.endCall(callId, 'success'); | ||
| this.recordRecoveryTrajectory(callId, toolName, sessionId, toolArgs, result.isError ? 'no_progress' : 'success', result); |
There was a problem hiding this comment.
Redact tool result before recording recovery trajectory
recordRecoveryTrajectory is called with the raw result before the existing redactSecrets(result) pass runs, so any tool response that echoes a substituted secret can be persisted to .openchrome/recovery/trajectory.jsonl via observationSummary. This only occurs when a tool result includes literal secret text (for example after ${SECRET:...} substitution), but in that case it creates a disk-level secret leak even though normal MCP responses are redacted.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
⚙️ superseded / quota notice (minimized)You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5a66247c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof value === 'string') { | ||
| if (LARGE_VALUE_KEY_RE.test(key) || value.length > 200) return hashValue(value); | ||
| return value; |
There was a problem hiding this comment.
Redact secret-like substrings in plain string arguments
The argument sanitizer currently keeps any short string value verbatim unless its key matches LARGE_VALUE_KEY_RE, so inputs like url, query, or value can persist credentials in cleartext (for example OAuth codes or token= query params) into trajectory.jsonl. This violates the ledger’s redaction guarantees because sensitive data often appears in values even when the field name is non-sensitive; apply content-level redaction/hashing to all string values, not only key-based cases.
Useful? React with 👍 / 👎.
| const previousFailed = | ||
| previousTrajectory?.resultStatus === 'error' || | ||
| previousTrajectory?.resultStatus === 'no_progress' || | ||
| previousTrajectory?.resultStatus === 'aborted'; |
There was a problem hiding this comment.
Exclude aborted attempts from recovery detection
Recovery detection treats aborted as a prior failure, so a tool call aborted due client disconnect can cause the next successful different tool in the same context to be labeled recovered. That inflates recovery telemetry with non-recovery events (cancellation/network interruptions), making downstream analysis inaccurate; previousFailed should only include actual failure modes such as error/no_progress.
Useful? React with 👍 / 👎.
c5a6624 to
1b85b91
Compare
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b85b915bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof value === 'string') { | ||
| if (LARGE_VALUE_KEY_RE.test(key) || value.length > 200) return hashValue(value); | ||
| return value; |
There was a problem hiding this comment.
Redact secret-like values in non-sensitive arg fields
The new ledger can still persist raw credentials when they are passed under ordinary keys (for example, note, url, or query) because short strings are only redacted by key name heuristics and length. In sanitizeObject, any string on a non-matching key and ≤200 chars is written verbatim, so a token embedded in a normal field will be stored in argsSummary, which contradicts the “no raw secrets” safety goal for this telemetry sink.
Useful? React with 👍 / 👎.
| const nodes = content.trim().length === 0 | ||
| ? [] | ||
| : content.trim().split('\n').map((line) => JSON.parse(line) as RecoveryTrajectoryNode); |
There was a problem hiding this comment.
Skip malformed JSONL entries instead of dropping all history
readRecent parses the entire file with a single map(JSON.parse), so one malformed/truncated line (e.g., after interruption during append) throws and causes the catch path to return only in-memory pending nodes. That effectively hides all persisted recovery history until manual cleanup, which can invalidate recovery telemetry analysis from otherwise valid earlier lines.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da8cab8ea4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (let offset = 0; offset < output.length; offset += 16_384) { | ||
| const chunk = output.slice(offset, offset + 16_384); |
There was a problem hiding this comment.
Chunk stdout by bytes rather than UTF-16 code units
Slicing output with String.prototype.slice can split a UTF-16 surrogate pair when a non-BMP character falls on a 16,384 boundary, so the two halves are encoded separately and emitted as replacement characters; that corrupts the JSON stream and can make --introspect-tools-list output unparsable for clients when tool metadata includes astral Unicode (e.g., emoji). Chunking should be done on a Buffer (byte boundaries) or via a stream writer that preserves code points.
Useful? React with 👍 / 👎.
⚙️ superseded / quota notice (minimized)@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Add a passive, bounded trajectory ledger so failed and recovered tool-call attempts survive as redacted harness telemetry while preserving existing execution semantics.\n\nConstraint: LATS-inspired adoption must avoid live-browser tree search and destructive auto-replay.\nRejected: Full MCTS/browser branching | unsafe for real authenticated Chrome sessions and out of scope for #1017.\nConfidence: high\nScope-risk: narrow\nDirective: Keep this ledger passive; future ranking/search work must remain safety-gated and should consume these nodes rather than replay them.\nTested: npm test -- --runTestsByPath tests/recovery/trajectory-ledger.test.ts; npm run build\nNot-tested: Full browser E2E transcript; fixture route was added for follow-up live verification.
Constraint: Codex review found no-progress outcomes were not treated as recoverable/stuck and multi-tab sessions could be cross-labeled.\nRejected: Reading the ledger from disk on every tool call | that would keep passive telemetry on the request hot path and reintroduce blocking I/O risk.\nConfidence: high\nScope-risk: narrow\nDirective: Recovery telemetry must remain best-effort, non-blocking, and scoped to the same session/tab context.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/recovery/trajectory-ledger.test.ts\nNot-tested: full CI matrix
Recovery telemetry is persisted to disk, so argument summarization must recognize common camelCase credential names such as accessToken, sessionId, and authHeader instead of relying only on separator-delimited keys. Constraint: Codex review flagged this as P1 secret exposure in trajectory ledger persistence. Rejected: Hashing all short strings | it would remove useful non-sensitive telemetry and reduce ledger diagnostic value. Confidence: high Scope-risk: narrow Directive: Keep recovery ledger summaries useful, but prefer false-positive redaction over credential leakage. Tested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/recovery/trajectory-ledger.test.ts. Not-tested: Full GitHub Actions matrix before push.
Recovery trajectory summaries are persisted before the response envelope leaves the server, so they must receive the same secret-redacted MCP result that the client receives. Constraint: Codex review flagged raw result summaries in recovery ledger persistence as P1 secret exposure. Rejected: Depending on summarizeResult pattern redaction | literal secret substitutions require the process secret store redactor. Confidence: high Scope-risk: narrow Directive: Persisted telemetry should consume redacted result/error text, never pre-redaction handler output. Tested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/recovery/trajectory-ledger.test.ts. Not-tested: Full GitHub Actions matrix before push.
Constraint: Codex review found recovery ledger telemetry could leak ordinary-field secrets and drop all persisted history after one torn JSONL line.\nRejected: Hashing every short ordinary string | would destroy useful low-risk recovery context.\nConfidence: high\nScope-risk: narrow\nDirective: Keep recovery telemetry best-effort and non-disruptive; malformed lines must not hide valid entries.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/recovery/trajectory-ledger.test.ts; npm run lint:tool-schemas\nNot-tested: full CI matrix\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
c1b18b7 to
0a68816
Compare
|
@codex review Rebased on develop, fixed the Windows 22 admin-keys JSON-extraction flake (extractJsonArray now scans line-by-line, skipping bracketed worker noise like [WorkflowEngine...]), and tightened ledger redaction for camelCase secret keys + pre-record sanitization. All 17 targeted tests pass locally. |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary\n- Adds a passive, bounded RecoveryTrajectoryLedger for redacted tool-attempt telemetry.\n- Wires MCP tool success/error completions into the ledger without changing browser behavior.\n- Adds a local stale-ref fixture route for follow-up live OpenChrome verification.\n\nCloses #1017.\n\n## Validation\n- npm test -- --runTestsByPath tests/recovery/trajectory-ledger.test.ts\n- npm run build\n\n## Live OpenChrome verification\n- Not run in this PR session; the PR adds /recovery/stale-ref fixture support so merge validation can run the issue's required MCP transcript deterministically.\n\n## Safety\n- Ledger is best-effort and passive only.\n- Secrets and large payloads are redacted/hashed.\n- Set OPENCHROME_RECOVERY_LEDGER=0 to disable.