Refine runtime skill selection and scaffold skill docs#269
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
📝 WalkthroughWalkthroughAdds skill selection with context injection, tightens Daytona sandbox ownership filtering and session manifest paths, updates env/config keys, expands VFS roots, introduces long-context chunking tools and diagnostics CLI, and comprehensively rewrites skills/runtime documentation and links. ChangesRuntime features, sandbox access, and skills
Architecture, skills, and reference documentation
Phase tracking, audit, and changelog
Server config and dependencies cleanup
Frontend tests and mocks alignment
Backend tests updates
Miscellaneous config and comments
Sequence Diagram(s)(none) Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
|
There was a problem hiding this comment.
Code Review
This pull request transitions the runtime architecture to AgentRuntime and EscalatingFleetModule, introduces a proactive SkillSelectionModule to inject relevant skill instructions into context, and updates the bundled agent skills and volume layout to support durable phase roots. The review feedback highlights several critical issues: the need to restore common stopwords in rank_chunks.py to maintain search relevance, ensuring regex_chunks preserves leading content and correctly detects markdown files starting with a heading, robustly parsing stringified Python lists in _parse_skill_names, and loading local .env variables in the diagnostics script to prevent false negatives.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (8)
src/fleet_rlm/runtime/modules/skill_selection.py (1)
196-208: ⚡ Quick winFall back to keyword candidates when the LLM returns no valid skills.
If
self.select(...)succeeds but_parse_skill_namesfilters everything out (LLM returned unknown names or empty output),selectedbecomes[]and the request injects no skill context — even though_keyword_matchhad produced viable candidates. Theexceptpath already degrades to keyword candidates; the valid-but-empty path should do the same for consistent graceful degradation.♻️ Proposed fix
prediction = self.select( context=context, available_skills=_format_available_skills(), ) selected = self._parse_skill_names(getattr(prediction, "skills", [])) + if not selected: + selected = candidates[: self._max_skills] reasoning = str(getattr(prediction, "reasoning", "") or "")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/modules/skill_selection.py` around lines 196 - 208, If the LLM selection succeeds but _parse_skill_names returns an empty list, treat that the same as an exception by falling back to the keyword candidates: after calling self.select(...) and computing selected = self._parse_skill_names(...), check if selected is empty and if so set selected = candidates[: self._max_skills] and reasoning = "" (and optionally log a warning) so the module degrades gracefully when the LLM returns no valid skills.src/fleet_rlm/scaffold/skills/rlm/references/architecture.md (1)
3-20: 💤 Low valueConsider adding a path prefix note for clarity.
The module map table uses relative paths without the
src/fleet_rlm/prefix (e.g.,runtime/agent/runtime.pyinstead ofsrc/fleet_rlm/runtime/agent/runtime.py). Consider adding a brief note above the table to clarify that all paths are relative tosrc/fleet_rlm/for improved readability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/rlm/references/architecture.md` around lines 3 - 20, Add a one-line note above the module map table in architecture.md clarifying that all listed paths are relative to src/fleet_rlm/ (e.g., "Paths are relative to src/fleet_rlm/"); update the header area near the table that lists runtime/agent/runtime.py, runtime/modules/escalating.py, etc., so readers understand the base path without changing the listed entries.plan/README.md (1)
21-21: 💤 Low valueMinor grammar refinement: use hyphenated compound adjective.
When "closed phase" modifies "claims" as a compound adjective, it should be hyphenated: "closed-phase claims".
📝 Proposed fix
-This audit verifies that closed phase claims match current code, tests, docs, and bundled +This audit verifies that closed-phase claims match current code, tests, docs, and bundled🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plan/README.md` at line 21, Update the phrasing "closed phase claims" to the hyphenated compound adjective "closed-phase claims" in the README text (look for the exact phrase "closed phase claims") so the compound modifier is grammatically correct.src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py (1)
29-33: 💤 Low valueTest invokes the real
daytonaCLI.
check_daytona()runssubprocess.run(["daytona", "version"], ...), so this test shells out to the actual binary. The assertion still passes (return value depends only on the env vars), but the call makes the test non-hermetic and, on a host where the CLI hangs, susceptible to the unhandledTimeoutExpiredflagged indiagnose.py. Consider mockingsubprocess.runto keep it isolated.♻️ Example
def test_check_daytona_reports_missing_required_env(monkeypatch) -> None: monkeypatch.delenv("DAYTONA_API_KEY", raising=False) monkeypatch.delenv("DAYTONA_API_URL", raising=False) + monkeypatch.setattr(diagnose.subprocess, "run", lambda *a, **k: (_ for _ in ()).throw(FileNotFoundError)) assert diagnose.check_daytona() is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py` around lines 29 - 33, The test test_check_daytona_reports_missing_required_env calls diagnose.check_daytona which currently invokes subprocess.run(["daytona","version"],...) and thus shells out; modify the test to mock subprocess.run (and simulate raising subprocess.TimeoutExpired or returning a completed process as appropriate) so the test is hermetic and doesn't call the real daytona binary—patch subprocess.run in the test (or use monkeypatch to replace diagnose.subprocess.run) and assert check_daytona() is False while ensuring the mock matches the behavior expected inside diagnose.check_daytona (including handling TimeoutExpired).src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md (2)
19-21: 💤 Low valueAdd language specifier to fenced code block.
The fenced code block lacks a language specifier. Since this is a WebSocket endpoint path, consider using
```textor converting to an indented code block for clarity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md` around lines 19 - 21, The fenced code block containing the WebSocket endpoint `/api/v1/ws/execution` lacks a language specifier; update that fenced block in contract-surfaces.md by adding a language like ```text (or change it to an indented code block) so the endpoint is rendered clearly—locate the block showing `/api/v1/ws/execution` and add the language specifier or convert to an indented block.
27-29: 💤 Low valueAdd language specifier to fenced code block.
The fenced code block lacks a language specifier. Since this is a WebSocket endpoint path, consider using
```textor converting to an indented code block for clarity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md` around lines 27 - 29, The fenced code block containing the WebSocket endpoint "/api/v1/ws/execution/events" is missing a language specifier; update the block to include a language (e.g., use ```text before the line) or convert it to an indented code block so the endpoint is rendered clearly and not treated as an unspecified code fence.src/fleet_rlm/scaffold/skills/delegation/SKILL.md (1)
107-107: ⚡ Quick winAdd rationale for the
parallel_semantic_mapprohibition.The strong prohibition lacks context. Consider briefly explaining why
parallel_semantic_mapshould not be used or what to use instead, so users understand the reason behind this constraint.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/delegation/SKILL.md` at line 107, Add a short rationale sentence after the line forbidding parallel_semantic_map that explains why it's prohibited and what to use instead; specifically mention the symbol parallel_semantic_map and state the reason (e.g., it causes nondeterministic/unsafe parallel state updates or conflicts with Daytona orchestration) and recommend the approved alternative (e.g., use serial_semantic_map or the provided task-queue/delegation API). Keep it one concise sentence or clause so readers immediately understand the constraint and the replacement.src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md (1)
11-11: 💤 Low valueConsider adding language specifier to fenced code blocks.
The ASCII art diagrams at lines 11-17 and 58-74 use fenced code blocks without a language specifier, which triggers markdown linting warnings. Consider adding
textas the language:♻️ Proposed fix
-``` +```text Dataset → Split (train / dev)-``` +```text 1. Initial prompt ↓Also applies to: 58-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md` at line 11, Add a language specifier to the fenced code blocks containing ASCII diagrams so markdown lint stops warning: update the two blocks referenced (the ASCII art diagram around lines 11-17 and the other at 58-74 in the file) to use ```text instead of plain ```; ensure both diagram blocks (the "Dataset → Split (train / dev)" block and the "Initial prompt ↓" block) are updated consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py`:
- Around line 32-43: The try block calling subprocess.run(["daytona",
"version"], ...) currently only catches FileNotFoundError so a
subprocess.TimeoutExpired would crash the diagnostics; update the exception
handling around that subprocess.run (the call to subprocess.run and subsequent
status("daytona cli", ...)) to also catch subprocess.TimeoutExpired and in that
except path call status("daytona cli", False, "timeout") or status("daytona
cli", True, "optional (timeout)"), so the diagnostics degrades gracefully
instead of raising; ensure you import subprocess (already used) and reference
the existing subprocess.run invocation and the status("daytona cli", ...) call
when adding the new except clause.
In `@src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py`:
- Around line 44-57: The regex_chunks function drops any content before the
first match because bounds is built from starts (which begins at the first
match); change bounds to include the document start so leading content is
preserved (e.g., make bounds start with 0 like bounds = [0, *starts,
len(content)]), then continue using the same loop and calls to size_chunks
(preserving the offset=start logic) so the initial segment prior to the first
pattern match is emitted as a chunk.
In
`@src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md`:
- Around line 48-156: Update the docs to use the real evaluation APIs and trace
schema: replace calls to evaluate_module with evaluate_program or
evaluate_program_from_dataset (from fleet_rlm.quality.dspy_evaluation), swap the
LLMJudgeScorer example to show how to use build_rlm_scorers or the provided
reasoning_quality_scorer (from fleet_rlm.quality.scorers) instead of a
non-existent LLMJudgeScorer, and change the TraceAwareScorer example to read
prediction.trace via trace.search_spans() and inspect span fields like
span.name, span.inputs and span.span_type (or span.type) rather than
trace.steps/step.* so the example matches current trace usage.
In
`@src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md`:
- Around line 9-25: The fenced code block showing the directory tree lacks a
language specifier; update the opening fence (the triple backticks that precede
"/home/daytona/memory/") to include a language token such as text or plaintext
(e.g., change "```" to "```text") so the block is declared and satisfies MD040;
keep the rest of the directory listing (the tree content) unchanged.
In `@src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md`:
- Around line 23-41: The fenced code block containing the directory tree in
SKILL.md is missing a language specifier and triggers MD040; update the opening
triple backticks that start the tree block to include a language token (e.g.,
```text or ```plaintext) so the block becomes a plain text fenced code block,
leaving the rest of the tree content unchanged.
---
Nitpick comments:
In `@plan/README.md`:
- Line 21: Update the phrasing "closed phase claims" to the hyphenated compound
adjective "closed-phase claims" in the README text (look for the exact phrase
"closed phase claims") so the compound modifier is grammatically correct.
In `@src/fleet_rlm/runtime/modules/skill_selection.py`:
- Around line 196-208: If the LLM selection succeeds but _parse_skill_names
returns an empty list, treat that the same as an exception by falling back to
the keyword candidates: after calling self.select(...) and computing selected =
self._parse_skill_names(...), check if selected is empty and if so set selected
= candidates[: self._max_skills] and reasoning = "" (and optionally log a
warning) so the module degrades gracefully when the LLM returns no valid skills.
In `@src/fleet_rlm/scaffold/skills/delegation/SKILL.md`:
- Line 107: Add a short rationale sentence after the line forbidding
parallel_semantic_map that explains why it's prohibited and what to use instead;
specifically mention the symbol parallel_semantic_map and state the reason
(e.g., it causes nondeterministic/unsafe parallel state updates or conflicts
with Daytona orchestration) and recommend the approved alternative (e.g., use
serial_semantic_map or the provided task-queue/delegation API). Keep it one
concise sentence or clause so readers immediately understand the constraint and
the replacement.
In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md`:
- Around line 19-21: The fenced code block containing the WebSocket endpoint
`/api/v1/ws/execution` lacks a language specifier; update that fenced block in
contract-surfaces.md by adding a language like ```text (or change it to an
indented code block) so the endpoint is rendered clearly—locate the block
showing `/api/v1/ws/execution` and add the language specifier or convert to an
indented block.
- Around line 27-29: The fenced code block containing the WebSocket endpoint
"/api/v1/ws/execution/events" is missing a language specifier; update the block
to include a language (e.g., use ```text before the line) or convert it to an
indented code block so the endpoint is rendered clearly and not treated as an
unspecified code fence.
In `@src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py`:
- Around line 29-33: The test test_check_daytona_reports_missing_required_env
calls diagnose.check_daytona which currently invokes
subprocess.run(["daytona","version"],...) and thus shells out; modify the test
to mock subprocess.run (and simulate raising subprocess.TimeoutExpired or
returning a completed process as appropriate) so the test is hermetic and
doesn't call the real daytona binary—patch subprocess.run in the test (or use
monkeypatch to replace diagnose.subprocess.run) and assert check_daytona() is
False while ensuring the mock matches the behavior expected inside
diagnose.check_daytona (including handling TimeoutExpired).
In `@src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md`:
- Line 11: Add a language specifier to the fenced code blocks containing ASCII
diagrams so markdown lint stops warning: update the two blocks referenced (the
ASCII art diagram around lines 11-17 and the other at 58-74 in the file) to use
```text instead of plain ```; ensure both diagram blocks (the "Dataset → Split
(train / dev)" block and the "Initial prompt ↓" block) are updated consistently.
In `@src/fleet_rlm/scaffold/skills/rlm/references/architecture.md`:
- Around line 3-20: Add a one-line note above the module map table in
architecture.md clarifying that all listed paths are relative to src/fleet_rlm/
(e.g., "Paths are relative to src/fleet_rlm/"); update the header area near the
table that lists runtime/agent/runtime.py, runtime/modules/escalating.py, etc.,
so readers understand the base path without changing the listed entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 66ac6d82-a1ef-4818-8413-a4b3f5669f5c
📒 Files selected for processing (64)
.env.exampledocs/how-to-guides/frontend-development.mddocs/reference/adr/001-rlm-runtime-architecture.mddocs/reference/daytona-runtime-architecture.mddocs/reference/frontend-backend-integration.mdplan/PHASE-3-PROGRESS.mdplan/PHASE-4-PROGRESS.mdplan/README.mdsrc/fleet_rlm/integrations/daytona/volumes.pysrc/fleet_rlm/runtime/agent/signatures.pysrc/fleet_rlm/runtime/modules/__init__.pysrc/fleet_rlm/runtime/modules/escalating.pysrc/fleet_rlm/runtime/modules/skill_selection.pysrc/fleet_rlm/scaffold/skills/README.mdsrc/fleet_rlm/scaffold/skills/__init__.pysrc/fleet_rlm/scaffold/skills/daytona-runtime/SKILL.mdsrc/fleet_rlm/scaffold/skills/daytona-sandbox/SKILL.mdsrc/fleet_rlm/scaffold/skills/delegation/SKILL.mdsrc/fleet_rlm/scaffold/skills/delegation/references/budget-model.mdsrc/fleet_rlm/scaffold/skills/delegation/references/isolation-modes.mdsrc/fleet_rlm/scaffold/skills/diagnostics/SKILL.mdsrc/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.mdsrc/fleet_rlm/scaffold/skills/diagnostics/references/error-catalog.mdsrc/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.pysrc/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.pysrc/fleet_rlm/scaffold/skills/dspy-programs/SKILL.mdsrc/fleet_rlm/scaffold/skills/dspy-programs/references/module-registry.mdsrc/fleet_rlm/scaffold/skills/dspy-programs/references/signature-catalog.mdsrc/fleet_rlm/scaffold/skills/dspy-signature/SKILL.mdsrc/fleet_rlm/scaffold/skills/long-context/SKILL.mdsrc/fleet_rlm/scaffold/skills/long-context/references/chunking-strategies.mdsrc/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.pysrc/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.pysrc/fleet_rlm/scaffold/skills/long-context/tests/test_rank_chunks.pysrc/fleet_rlm/scaffold/skills/long-context/tests/test_semantic_chunk.pysrc/fleet_rlm/scaffold/skills/optimization/SKILL.mdsrc/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.mdsrc/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.mdsrc/fleet_rlm/scaffold/skills/rlm-batch/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-debug/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-debug/scripts/diagnose.pysrc/fleet_rlm/scaffold/skills/rlm-execute/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-long-context/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-long-context/references/advanced-techniques.mdsrc/fleet_rlm/scaffold/skills/rlm-long-context/references/codebase-processing.mdsrc/fleet_rlm/scaffold/skills/rlm-long-context/scripts/cache_manager.pysrc/fleet_rlm/scaffold/skills/rlm-long-context/scripts/codebase_concat.pysrc/fleet_rlm/scaffold/skills/rlm-long-context/scripts/defaults.pysrc/fleet_rlm/scaffold/skills/rlm-long-context/scripts/orchestrate.pysrc/fleet_rlm/scaffold/skills/rlm-long-context/scripts/semantic_chunk.pysrc/fleet_rlm/scaffold/skills/rlm-memory/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-run/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm-test-suite/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm/SKILL.mdsrc/fleet_rlm/scaffold/skills/rlm/references/api-reference.mdsrc/fleet_rlm/scaffold/skills/rlm/references/architecture.mdsrc/fleet_rlm/scaffold/skills/sandbox-execution/SKILL.mdsrc/fleet_rlm/scaffold/skills/sandbox-execution/references/env-config.mdsrc/fleet_rlm/scaffold/skills/sandbox-execution/references/volume-layout.mdsrc/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.mdsrc/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.mdsrc/fleet_rlm/scaffold/skills/volume-bootstrap/references/memory-db-schema.mdtests/unit/integrations/test_daytona_runtime.pytests/unit/runtime/test_skill_selection.py
💤 Files with no reviewable changes (19)
- src/fleet_rlm/scaffold/skills/rlm-debug/SKILL.md
- src/fleet_rlm/scaffold/skills/daytona-sandbox/SKILL.md
- src/fleet_rlm/scaffold/skills/daytona-runtime/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-run/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm/references/api-reference.md
- src/fleet_rlm/scaffold/skills/rlm-long-context/references/advanced-techniques.md
- src/fleet_rlm/scaffold/skills/rlm-batch/SKILL.md
- src/fleet_rlm/scaffold/skills/dspy-signature/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-test-suite/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-execute/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-debug/scripts/diagnose.py
- src/fleet_rlm/scaffold/skills/rlm-long-context/references/codebase-processing.md
- src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/semantic_chunk.py
- src/fleet_rlm/scaffold/skills/rlm-memory/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/defaults.py
- src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/codebase_concat.py
- src/fleet_rlm/scaffold/skills/rlm-long-context/SKILL.md
- src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/cache_manager.py
- src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/orchestrate.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/fleet_rlm/api/routers/ws/session.py (1)
234-245:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't drop read compatibility for legacy manifest locations yet.
After this change, reconnect only attempts
sessions/<session_id>/conversation.json. Older Daytona volumes that still store the manifest in the legacy location will now restore as an empty session, which effectively discards prior conversation state until something rewrites the new file. Please keep a read fallback or migrate the legacy file into the new location before removing the old path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/session.py` around lines 234 - 245, The new logic drops compatibility with legacy manifest locations causing restores to be empty; update the restore path in the reconnect branch so that after attempting _restore_manifest_from_local_store(persistence=persistence, sess_id=sess_id) you also try the legacy location and/or migrate it into the new location before returning: locate the reconnect code that uses interpreter, manifest_path, load_manifest_from_volume, and _restore_manifest_from_local_store and implement a fallback that checks the legacy manifest path (for the old Daytona volume layout) and if found either load it and set manifest or atomically move/copy it into the new sessions/<session_id>/conversation.json location so older volumes retain conversation history.src/fleet_rlm/api/dependencies.py (1)
213-217:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the
app.state.server_statecompatibility path until every initializer is migrated.
get_server_state*()still accepts legacyapp.state.server_state, but_require_dep()now ignores it and raises instead. That makes callers in the same module disagree on whether startup completed, and it breaks any request/websocket path that only seeds the legacy container.💡 Minimal compatibility patch
def _require_dep(app: Any, attr: str) -> Any: - candidate = getattr(getattr(app, "state", None), attr, None) + state = getattr(app, "state", None) + candidate = getattr(state, attr, None) if candidate is not None: return candidate + server_state = getattr(state, "server_state", None) + if isinstance(server_state, ServerState): + legacy_candidate = getattr(server_state, attr, None) + if legacy_candidate is not None: + return legacy_candidate raise RuntimeError(f"Server dependency '{attr}' is not initialized. Ensure FastAPI lifespan startup has completed.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/dependencies.py` around lines 213 - 217, The _require_dep function currently only checks getattr(app.state, attr) and throws if missing, but callers (get_server_state* functions) still accept the legacy app.state.server_state; update _require_dep to also look for the legacy container by checking getattr(app.state, "server_state", None) and, if present, try to read the desired attribute from that legacy object before raising. Specifically, in _require_dep use candidate = getattr(getattr(app, "state", None), attr, None) and if candidate is None, check legacy = getattr(getattr(app, "state", None), "server_state", None) and return getattr(legacy, attr, None) if found, otherwise raise the same RuntimeError to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/reference/daytona-architecture.md`:
- Around line 175-176: Update the Persistent Memory Model description to match
the new manifest location: replace the reference to "session manifests and
workspace provenance belong under `meta/workspaces/...`" with the new durable
path `sessions/<session_id>/conversation.json` (or state that workspace
provenance remains under `meta/workspaces/...` if only session manifests moved),
and remove any mention of the legacy `meta/workspaces/.../react-session-*.json`
fallback; ensure the text consistently uses
`sessions/<session_id>/conversation.json` where session manifest is discussed so
the lines about session manifest location (previously around "Session manifests
on durable storage live under `sessions/<session_id>/conversation.json`") and
the Persistent Memory Model do not conflict.
In `@src/fleet_rlm/api/runtime_services/sandboxes.py`:
- Around line 168-175: The current _list_labels_filter (and the duplicate logic
around lines 354-375) excludes sandboxes that lack SANDBOX_OWNER_LABEL,
stranding unlabeled legacy sandboxes; change the logic to preserve a controlled
legacy path by returning a filter that matches either the requested owner label
OR the absence of the owner label (i.e., include unlabeled sandboxes) when
SANDBOX_OWNER_LABEL is missing in owner_labels, or alternatively trigger a
migration/relabel step before enforcing owner-only filtering; update
_list_labels_filter and the same duplicated block to implement the “include
unlabeled (legacy)” branch or call a relabel/migration helper so old sandboxes
remain visible to detail/delete/archive operations.
---
Outside diff comments:
In `@src/fleet_rlm/api/dependencies.py`:
- Around line 213-217: The _require_dep function currently only checks
getattr(app.state, attr) and throws if missing, but callers (get_server_state*
functions) still accept the legacy app.state.server_state; update _require_dep
to also look for the legacy container by checking getattr(app.state,
"server_state", None) and, if present, try to read the desired attribute from
that legacy object before raising. Specifically, in _require_dep use candidate =
getattr(getattr(app, "state", None), attr, None) and if candidate is None, check
legacy = getattr(getattr(app, "state", None), "server_state", None) and return
getattr(legacy, attr, None) if found, otherwise raise the same RuntimeError to
preserve behavior.
In `@src/fleet_rlm/api/routers/ws/session.py`:
- Around line 234-245: The new logic drops compatibility with legacy manifest
locations causing restores to be empty; update the restore path in the reconnect
branch so that after attempting
_restore_manifest_from_local_store(persistence=persistence, sess_id=sess_id) you
also try the legacy location and/or migrate it into the new location before
returning: locate the reconnect code that uses interpreter, manifest_path,
load_manifest_from_volume, and _restore_manifest_from_local_store and implement
a fallback that checks the legacy manifest path (for the old Daytona volume
layout) and if found either load it and set manifest or atomically move/copy it
into the new sessions/<session_id>/conversation.json location so older volumes
retain conversation history.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 06c69f32-9341-4781-b8ce-5388a9285e2a
⛔ Files ignored due to path filters (1)
src/frontend/src/lib/rlm-api/generated/openapi.tsis excluded by!**/generated/**
📒 Files selected for processing (115)
.codex/agents/sandbox-runtime.toml.codex/config.toml.env.example.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-agent-files.json.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-tool-files.json.factory/validation/runtime-api-cleanup/scrutiny/reviews/runtime-consolidate-execution.json.factory/validation/runtime-api-cleanup/scrutiny/synthesis.json.ignoreCHANGELOG.mdCONTRIBUTING.mdREADME.mddocs/SUMMARY.mddocs/how-to-guides/codex-environment.mddocs/how-to-guides/deploying-server.mddocs/how-to-guides/developer-setup.mddocs/how-to-guides/dspy-integration.mddocs/how-to-guides/installation.mddocs/how-to-guides/releasing.mddocs/reference/auth.mddocs/reference/database.mddocs/reference/daytona-architecture.mddocs/reference/frontend-feature-spec.mddocs/reference/index.mddocs/reference/python-api.mddocs/reference/release-notes/0.4.94.mddocs/reference/release-notes/0.4.99.mddocs/reference/release-notes/0.5.0.mddocs/reference/sandbox-api.mddocs/tutorials/01-basic-usage.mdopenapi.yamloutput/longcot-eval/README.mdoutput/longcot-eval/final/comparison_report.mdoutput/longcot-eval/final/comparison_summary.mdoutput/longcot-eval/final/direct_100_tasks.jsonloutput/longcot-eval/final/direct_eval.jsonoutput/longcot-eval/final/rlm_100_tasks.jsonloutput/longcot-eval/final/rlm_eval.jsonoutput/longcot-eval/longcot_gepa_dataset.jsonloutput/longcot-eval/partial-runs/chess-math/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_095814.jsonloutput/longcot-eval/partial-runs/cs-chem/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_095754.jsonloutput/longcot-eval/partial-runs/longcot_all_longcot-mini_20260429_162551.jsonloutput/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_165416.jsonloutput/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171028.jsonloutput/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171619.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_072531.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_081152.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_114407.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_120033.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_143848.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_153040.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_162545.jsonloutput/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_170108.jsonloutput/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_165305.jsonloutput/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_204150.jsonloutput/longcot-eval/pilot/direct_20_tasks.jsonloutput/longcot-eval/pilot/rlm_20_tasks.jsonlplan/README.mdscripts/check_docs_quality.pysrc/fleet_rlm/api/config.pysrc/fleet_rlm/api/dependencies.pysrc/fleet_rlm/api/routers/sandboxes.pysrc/fleet_rlm/api/routers/ws/session.pysrc/fleet_rlm/api/routers/ws/stream.pysrc/fleet_rlm/api/runtime_services/diagnostics.pysrc/fleet_rlm/api/runtime_services/sandbox_service.pysrc/fleet_rlm/api/runtime_services/sandboxes.pysrc/fleet_rlm/api/runtime_services/session_paths.pysrc/fleet_rlm/api/schemas/optimization.pysrc/fleet_rlm/cli/runners.pysrc/fleet_rlm/integrations/config/config_full.yamlsrc/fleet_rlm/integrations/config/env.pysrc/fleet_rlm/integrations/config/runtime_settings.pysrc/fleet_rlm/integrations/daytona/bridge.pysrc/fleet_rlm/integrations/daytona/config.pysrc/fleet_rlm/integrations/daytona/sandbox_executor.pysrc/fleet_rlm/integrations/local_store.pysrc/fleet_rlm/integrations/observability/mlflow_runtime.pysrc/fleet_rlm/quality/datasets.pysrc/fleet_rlm/runtime/agent/runtime.pysrc/fleet_rlm/runtime/modules/skill_selection.pysrc/fleet_rlm/runtime/schemas.pysrc/fleet_rlm/scaffold/skills/delegation/SKILL.mdsrc/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.mdsrc/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.pysrc/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.pysrc/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.pysrc/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.pysrc/fleet_rlm/scaffold/skills/long-context/tests/test_rank_chunks.pysrc/fleet_rlm/scaffold/skills/long-context/tests/test_semantic_chunk.pysrc/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.mdsrc/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.mdsrc/fleet_rlm/scaffold/skills/rlm/references/architecture.mdsrc/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.mdsrc/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.mdsrc/frontend/.env.examplesrc/frontend/openapi/fleet-rlm.openapi.yamlsrc/frontend/src/features/workspace/workbench/__tests__/run-workbench.test.tsxsrc/frontend/src/lib/auth/__tests__/token-store.test.tssrc/frontend/src/lib/data/mock/runtime.tssrc/frontend/src/lib/rlm-api/__tests__/backend-contract.test.tssrc/frontend/src/lib/rlm-api/__tests__/config.test.tssrc/frontend/src/lib/rlm-api/__tests__/ws-frame-parser.test.tssrc/frontend/src/router.tsxtests/unit/api/test_chat_persistence.pytests/unit/api/test_dependencies.pytests/unit/api/test_sandboxes.pytests/unit/api/test_session_paths.pytests/unit/cli/test_mlflow_cli.pytests/unit/integrations/test_daytona_config.pytests/unit/integrations/test_env_config.pytests/unit/integrations/test_memory_db.pytests/unit/integrations/test_observability.pytests/unit/runtime/test_execution.pytests/unit/runtime/test_knowledge_tools.pytests/unit/runtime/test_skill_selection.py
💤 Files with no reviewable changes (21)
- output/longcot-eval/final/comparison_summary.md
- output/longcot-eval/README.md
- output/longcot-eval/pilot/rlm_20_tasks.jsonl
- .factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-tool-files.json
- output/longcot-eval/final/comparison_report.md
- output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171619.jsonl
- .factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-agent-files.json
- output/longcot-eval/final/direct_eval.json
- output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_204150.jsonl
- output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_165416.jsonl
- tests/unit/api/test_sandboxes.py
- output/longcot-eval/final/rlm_eval.json
- tests/unit/api/test_dependencies.py
- src/fleet_rlm/api/runtime_services/diagnostics.py
- output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_165305.jsonl
- src/fleet_rlm/integrations/daytona/config.py
- src/fleet_rlm/api/runtime_services/session_paths.py
- src/fleet_rlm/api/runtime_services/sandbox_service.py
- src/fleet_rlm/integrations/daytona/bridge.py
- tests/unit/integrations/test_daytona_config.py
- src/fleet_rlm/api/config.py
✅ Files skipped from review due to trivial changes (44)
- src/frontend/src/router.tsx
- docs/reference/release-notes/0.4.99.md
- docs/how-to-guides/deploying-server.md
- src/frontend/src/lib/rlm-api/tests/ws-frame-parser.test.ts
- docs/reference/release-notes/0.4.94.md
- docs/SUMMARY.md
- .ignore
- src/frontend/.env.example
- docs/reference/auth.md
- src/frontend/src/features/workspace/workbench/tests/run-workbench.test.tsx
- CONTRIBUTING.md
- .factory/validation/runtime-api-cleanup/scrutiny/reviews/runtime-consolidate-execution.json
- docs/tutorials/01-basic-usage.md
- docs/reference/frontend-feature-spec.md
- docs/reference/python-api.md
- docs/reference/index.md
- docs/how-to-guides/developer-setup.md
- .factory/validation/runtime-api-cleanup/scrutiny/synthesis.json
- src/fleet_rlm/runtime/schemas.py
- docs/reference/sandbox-api.md
- tests/unit/integrations/test_observability.py
- src/fleet_rlm/api/routers/ws/stream.py
- docs/how-to-guides/dspy-integration.md
- src/frontend/src/lib/rlm-api/tests/config.test.ts
- src/fleet_rlm/cli/runners.py
- src/fleet_rlm/runtime/agent/runtime.py
- src/fleet_rlm/scaffold/skills/rlm/references/architecture.md
- docs/reference/database.md
- src/fleet_rlm/api/schemas/optimization.py
- docs/how-to-guides/installation.md
- src/fleet_rlm/quality/datasets.py
- src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md
- src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md
- tests/unit/runtime/test_execution.py
- docs/reference/release-notes/0.5.0.md
- src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md
- src/fleet_rlm/integrations/local_store.py
- src/frontend/openapi/fleet-rlm.openapi.yaml
- openapi.yaml
- src/fleet_rlm/integrations/config/runtime_settings.py
- src/fleet_rlm/integrations/observability/mlflow_runtime.py
- src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md
- plan/README.md
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
- src/fleet_rlm/scaffold/skills/delegation/SKILL.md
- src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md
- src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py
- src/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.py
| - Session manifests on durable storage live under `sessions/<session_id>/conversation.json`. | ||
| - Manifest readers use the current session conversation path only. |
There was a problem hiding this comment.
Manifest path contradicts the Persistent Memory Model section.
Lines 175–176 establish that session manifests now live at sessions/<session_id>/conversation.json, but Line 200 still states that "session manifests and workspace provenance belong under meta/workspaces/...". Since the PR removes the legacy meta/workspaces/.../react-session-*.json fallback, Line 200 should be reconciled to avoid documenting two conflicting durable manifest locations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/reference/daytona-architecture.md` around lines 175 - 176, Update the
Persistent Memory Model description to match the new manifest location: replace
the reference to "session manifests and workspace provenance belong under
`meta/workspaces/...`" with the new durable path
`sessions/<session_id>/conversation.json` (or state that workspace provenance
remains under `meta/workspaces/...` if only session manifests moved), and remove
any mention of the legacy `meta/workspaces/.../react-session-*.json` fallback;
ensure the text consistently uses `sessions/<session_id>/conversation.json`
where session manifest is discussed so the lines about session manifest location
(previously around "Session manifests on durable storage live under
`sessions/<session_id>/conversation.json`") and the Persistent Memory Model do
not conflict.
| def _list_labels_filter( | ||
| *, | ||
| owner_labels: dict[str, str] | None, | ||
| allow_unlabeled_legacy: bool, | ||
| ) -> dict[str, str] | None: | ||
| if allow_unlabeled_legacy or not owner_labels: | ||
| if not owner_labels: | ||
| return None | ||
| owner_value = owner_labels.get(SANDBOX_OWNER_LABEL) | ||
| return {SANDBOX_OWNER_LABEL: owner_value} if owner_value else None |
There was a problem hiding this comment.
This strands legacy unlabeled sandboxes.
Returning False whenever the owner label is missing means pre-label sandboxes now 404 on detail/delete/archive, and load_sandbox_list() silently returns an empty page if a caller forgets to pass owner_labels. That removes the only cleanup path for older resources. Please keep a controlled legacy path or add a relabel/migration step before making owner labels mandatory.
Also applies to: 354-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fleet_rlm/api/runtime_services/sandboxes.py` around lines 168 - 175, The
current _list_labels_filter (and the duplicate logic around lines 354-375)
excludes sandboxes that lack SANDBOX_OWNER_LABEL, stranding unlabeled legacy
sandboxes; change the logic to preserve a controlled legacy path by returning a
filter that matches either the requested owner label OR the absence of the owner
label (i.e., include unlabeled sandboxes) when SANDBOX_OWNER_LABEL is missing in
owner_labels, or alternatively trigger a migration/relabel step before enforcing
owner-only filtering; update _list_labels_filter and the same duplicated block
to implement the “include unlabeled (legacy)” branch or call a relabel/migration
helper so old sandboxes remain visible to detail/delete/archive operations.
Summary
rlm,long-context,diagnostics,delegation,dspy-programs,optimization, and related helpers, including lighter helper scripts and focused tests.Testing
uv run pytest src/fleet_rlm/scaffold/skills/long-context/tests src/fleet_rlm/scaffold/skills/diagnostics/tests -quv run ruff check src/fleet_rlm/scaffold/skills/long-context src/fleet_rlm/scaffold/skills/diagnosticsgit diff --checkmake check-docsSummary by CodeRabbit
Release Notes
New Features
BRAVE_SEARCH_API_KEYconfiguration.Documentation
Improvements