Refine runtime skill selection and scaffold skill docs by Zochory · Pull Request #269 · Qredence/fleet-rlm

Zochory · 2026-05-30T04:01:18Z

Summary

Remove the extra selector LLM call on no-keyword turns and keep returned skill metadata aligned with the injected context budget.
Update the volume-bootstrap, diagnostics, and Brave env guidance to match the current runtime and filesystem contracts.
Refresh the bundled scaffold skills for rlm, long-context, diagnostics, delegation, dspy-programs, optimization, and related helpers, including lighter helper scripts and focused tests.

Testing

uv run pytest src/fleet_rlm/scaffold/skills/long-context/tests src/fleet_rlm/scaffold/skills/diagnostics/tests -q
uv run ruff check src/fleet_rlm/scaffold/skills/long-context src/fleet_rlm/scaffold/skills/diagnostics
git diff --check
make check-docs

Summary by CodeRabbit

Release Notes

New Features
- Added dynamic skill selection system for agents to intelligently select relevant capabilities.
- Added delegation skill for recursive RLM processing with budget management and isolation control.
- Added long-context processing skill with semantic chunking strategies.
- Added Brave Search API integration support via BRAVE_SEARCH_API_KEY configuration.
- Added comprehensive diagnostics skill for runtime troubleshooting.
Documentation
- Reorganized skill documentation with new references for delegation, sandbox execution, and volume management.
- Added architecture and API reference guides for skill usage and integration patterns.
- Updated runtime contract documentation for workspace isolation and session manifests.
Improvements
- Expanded volume filesystem support with additional canonical root paths for persistence.
- Enhanced environment configuration and Daytona integration guidance.
- Retired legacy skill modules and compatibility layers.

chatgpt-codex-connector · 2026-05-30T04:01:24Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2026-05-30T04:01:32Z

📝 Walkthrough

Walkthrough

Adds skill selection with context injection, tightens Daytona sandbox ownership filtering and session manifest paths, updates env/config keys, expands VFS roots, introduces long-context chunking tools and diagnostics CLI, and comprehensively rewrites skills/runtime documentation and links.

Changes

Runtime features, sandbox access, and skills

Layer / File(s)	Summary
Skill selection and enrichment before escalation `src/fleet_rlm/runtime/...`	Defines SkillSelectionSignature/Module, re-exports, and enriches core_memory in EscalatingFleetModule.
Durable VFS roots and Daytona bridge API cleanup `src/fleet_rlm/integrations/daytona/*`	Expands VFS roots and replaces removed bridge aliases with direct methods.
Sandbox ownership filtering and session paths `src/fleet_rlm/api/...`	Removes unlabeled legacy access, requires owner labels, and drops legacy manifest fallbacks.
Long-context scripts and tests `src/fleet_rlm/scaffold/skills/long-context/...`	Adds semantic_chunk, updates rank_chunks, and adds tests.
Diagnostics CLI and tests `src/fleet_rlm/scaffold/skills/diagnostics/...`	Adds environment diagnostics CLI and coverage.
Env examples and config schema updates `.env.example`, `src/fleet_rlm/integrations/config/*`	Updates env guidance, Brave keys, alt key names, and Daytona config resolution.

Architecture, skills, and reference documentation

Layer / File(s)	Summary
Skills README and package intro `src/fleet_rlm/scaffold/skills/*`	Generalizes skills overview and consumption guidance.
Delegation skill and references `src/fleet_rlm/scaffold/skills/delegation/*`	Documents delegation APIs, budget model, isolation modes.
Diagnostics skill and references `src/fleet_rlm/scaffold/skills/diagnostics/*`	Adds diagnostics skill, error catalog, and contract surfaces.
Sandbox execution and references `src/fleet_rlm/scaffold/skills/sandbox-execution/*`	Adds execution how-to and environment/volume references.
Volume bootstrap and memory DB schema `src/fleet_rlm/scaffold/skills/volume-bootstrap/*`	Documents filesystem contract and memory DB schema.
DSPy programs and registry `src/fleet_rlm/scaffold/skills/dspy-programs/*`	Adds DSPy programs skill, registry, and signature catalog updates.
Long-context skill and strategies `src/fleet_rlm/scaffold/skills/long-context/*`	Introduces long-context skill and chunking strategies.
RLM skill overview rewrite `src/fleet_rlm/scaffold/skills/rlm/*`	Condenses RLM skill guide.
ADR/runtime and Daytona architecture docs `docs/reference/*`	Updates ADR and Daytona architecture with new runtime shape.
Guides, TOCs, and references refresh `docs/*`	Renames links and clarifies contracts across guides.
OpenAPI and schema wording alignment `openapi.yaml`, `src/frontend/openapi/*`	Aligns GEPA availability description.

Phase tracking, audit, and changelog

Layer / File(s)	Summary
Phase 3/4 progress and audit plan `plan/*`	Marks phases closed, adds audit plan and results.
Changelog and contributing updates `CHANGELOG.md`, `CONTRIBUTING.md`	Refreshes release notes and doc paths.

Server config and dependencies cleanup

Layer / File(s)	Summary
ServerRuntimeConfig and dependency accessors `src/fleet_rlm/api/*`	Removes legacy Entra issuer and server_state fallbacks.
Docs quality archived-path checks `scripts/check_docs_quality.py`	Replaces legacy checks with archived checks.

Frontend tests and mocks alignment

Layer / File(s)	Summary
Tests for tokens, ws URLs, frames; mock runtime values `src/frontend/*`	Aligns tests/mocks with retired routes/tokens and masked values.

Backend tests updates

Layer / File(s)	Summary
Updated unit tests for new contracts `tests/unit/*`	Updates tests for paths, deps, sandbox filters, VFS, scorers, storage, and knowledge tools.

Miscellaneous config and comments

Layer / File(s)	Summary
Small wording and ignore updates `README.md`, `.ignore`, various comments	Minor wording, ignore, and docstring adjustments.

Sequence Diagram(s)

(none)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Qredence/fleet-rlm#235 — Also updates isolation-mode env guidance overlapping RLM child isolation behavior.
Qredence/fleet-rlm#268 — Related Daytona persistence/runtime area touched by VFS root allowlist expansions.

Poem

A rabbit selects the skills to bring,
Packs them snug in core-memory’s ring.
Sandboxes labeled, paths made clean,
Long texts chunked in tidy sheen.
Daytona whispers: “API URL!”
While docs and tests in chorus swell.
Hop—submit—results that tell. 🐇✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

gemini-code-assist

Code Review

This pull request transitions the runtime architecture to AgentRuntime and EscalatingFleetModule, introduces a proactive SkillSelectionModule to inject relevant skill instructions into context, and updates the bundled agent skills and volume layout to support durable phase roots. The review feedback highlights several critical issues: the need to restore common stopwords in rank_chunks.py to maintain search relevance, ensuring regex_chunks preserves leading content and correctly detects markdown files starting with a heading, robustly parsing stringified Python lists in _parse_skill_names, and loading local .env variables in the diagnostics script to prevent false negatives.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (8)

src/fleet_rlm/runtime/modules/skill_selection.py (1)
196-208: ⚡ Quick win

Fall back to keyword candidates when the LLM returns no valid skills.

If self.select(...) succeeds but _parse_skill_names filters everything out (LLM returned unknown names or empty output), selected becomes [] and the request injects no skill context — even though _keyword_match had produced viable candidates. The except path already degrades to keyword candidates; the valid-but-empty path should do the same for consistent graceful degradation.
♻️ Proposed fix
                 prediction = self.select(
                     context=context,
                     available_skills=_format_available_skills(),
                 )
                 selected = self._parse_skill_names(getattr(prediction, "skills", []))
+                if not selected:
+                    selected = candidates[: self._max_skills]
                 reasoning = str(getattr(prediction, "reasoning", "") or "")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/runtime/modules/skill_selection.py` around lines 196 - 208, If
the LLM selection succeeds but _parse_skill_names returns an empty list, treat
that the same as an exception by falling back to the keyword candidates: after
calling self.select(...) and computing selected = self._parse_skill_names(...),
check if selected is empty and if so set selected = candidates[:
self._max_skills] and reasoning = "" (and optionally log a warning) so the
module degrades gracefully when the LLM returns no valid skills.
src/fleet_rlm/scaffold/skills/rlm/references/architecture.md (1)
3-20: 💤 Low value

Consider adding a path prefix note for clarity.

The module map table uses relative paths without the src/fleet_rlm/ prefix (e.g., runtime/agent/runtime.py instead of src/fleet_rlm/runtime/agent/runtime.py). Consider adding a brief note above the table to clarify that all paths are relative to src/fleet_rlm/ for improved readability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/rlm/references/architecture.md` around lines 3
- 20, Add a one-line note above the module map table in architecture.md
clarifying that all listed paths are relative to src/fleet_rlm/ (e.g., "Paths
are relative to src/fleet_rlm/"); update the header area near the table that
lists runtime/agent/runtime.py, runtime/modules/escalating.py, etc., so readers
understand the base path without changing the listed entries.
plan/README.md (1)
21-21: 💤 Low value

Minor grammar refinement: use hyphenated compound adjective.

When "closed phase" modifies "claims" as a compound adjective, it should be hyphenated: "closed-phase claims".
📝 Proposed fix
-This audit verifies that closed phase claims match current code, tests, docs, and bundled
+This audit verifies that closed-phase claims match current code, tests, docs, and bundled
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plan/README.md` at line 21, Update the phrasing "closed phase claims" to the
hyphenated compound adjective "closed-phase claims" in the README text (look for
the exact phrase "closed phase claims") so the compound modifier is
grammatically correct.
src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py (1)
29-33: 💤 Low value

Test invokes the real daytona CLI.

check_daytona() runs subprocess.run(["daytona", "version"], ...), so this test shells out to the actual binary. The assertion still passes (return value depends only on the env vars), but the call makes the test non-hermetic and, on a host where the CLI hangs, susceptible to the unhandled TimeoutExpired flagged in diagnose.py. Consider mocking subprocess.run to keep it isolated.
♻️ Example
 def test_check_daytona_reports_missing_required_env(monkeypatch) -> None:
     monkeypatch.delenv("DAYTONA_API_KEY", raising=False)
     monkeypatch.delenv("DAYTONA_API_URL", raising=False)
+    monkeypatch.setattr(diagnose.subprocess, "run", lambda *a, **k: (_ for _ in ()).throw(FileNotFoundError))
 
     assert diagnose.check_daytona() is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py` around
lines 29 - 33, The test test_check_daytona_reports_missing_required_env calls
diagnose.check_daytona which currently invokes
subprocess.run(["daytona","version"],...) and thus shells out; modify the test
to mock subprocess.run (and simulate raising subprocess.TimeoutExpired or
returning a completed process as appropriate) so the test is hermetic and
doesn't call the real daytona binary—patch subprocess.run in the test (or use
monkeypatch to replace diagnose.subprocess.run) and assert check_daytona() is
False while ensuring the mock matches the behavior expected inside
diagnose.check_daytona (including handling TimeoutExpired).
src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md (2)
19-21: 💤 Low value

Add language specifier to fenced code block.

The fenced code block lacks a language specifier. Since this is a WebSocket endpoint path, consider using ```text or converting to an indented code block for clarity.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md`
around lines 19 - 21, The fenced code block containing the WebSocket endpoint
`/api/v1/ws/execution` lacks a language specifier; update that fenced block in
contract-surfaces.md by adding a language like ```text (or change it to an
indented code block) so the endpoint is rendered clearly—locate the block
showing `/api/v1/ws/execution` and add the language specifier or convert to an
indented block.
27-29: 💤 Low value

Add language specifier to fenced code block.

The fenced code block lacks a language specifier. Since this is a WebSocket endpoint path, consider using ```text or converting to an indented code block for clarity.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md`
around lines 27 - 29, The fenced code block containing the WebSocket endpoint
"/api/v1/ws/execution/events" is missing a language specifier; update the block
to include a language (e.g., use ```text before the line) or convert it to an
indented code block so the endpoint is rendered clearly and not treated as an
unspecified code fence.
src/fleet_rlm/scaffold/skills/delegation/SKILL.md (1)
107-107: ⚡ Quick win

Add rationale for the parallel_semantic_map prohibition.

The strong prohibition lacks context. Consider briefly explaining why parallel_semantic_map should not be used or what to use instead, so users understand the reason behind this constraint.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/delegation/SKILL.md` at line 107, Add a short
rationale sentence after the line forbidding parallel_semantic_map that explains
why it's prohibited and what to use instead; specifically mention the symbol
parallel_semantic_map and state the reason (e.g., it causes
nondeterministic/unsafe parallel state updates or conflicts with Daytona
orchestration) and recommend the approved alternative (e.g., use
serial_semantic_map or the provided task-queue/delegation API). Keep it one
concise sentence or clause so readers immediately understand the constraint and
the replacement.
src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md (1)
11-11: 💤 Low value

Consider adding language specifier to fenced code blocks.

The ASCII art diagrams at lines 11-17 and 58-74 use fenced code blocks without a language specifier, which triggers markdown linting warnings. Consider adding text as the language:
♻️ Proposed fix
-```
+```text
 Dataset
   → Split (train / dev)
-```
+```text
 1. Initial prompt
    ↓
Also applies to: 58-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md` at
line 11, Add a language specifier to the fenced code blocks containing ASCII
diagrams so markdown lint stops warning: update the two blocks referenced (the
ASCII art diagram around lines 11-17 and the other at 58-74 in the file) to use
```text instead of plain ```; ensure both diagram blocks (the "Dataset → Split
(train / dev)" block and the "Initial prompt ↓" block) are updated consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py`:
- Around line 32-43: The try block calling subprocess.run(["daytona",
"version"], ...) currently only catches FileNotFoundError so a
subprocess.TimeoutExpired would crash the diagnostics; update the exception
handling around that subprocess.run (the call to subprocess.run and subsequent
status("daytona cli", ...)) to also catch subprocess.TimeoutExpired and in that
except path call status("daytona cli", False, "timeout") or status("daytona
cli", True, "optional (timeout)"), so the diagnostics degrades gracefully
instead of raising; ensure you import subprocess (already used) and reference
the existing subprocess.run invocation and the status("daytona cli", ...) call
when adding the new except clause.

In `@src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py`:
- Around line 44-57: The regex_chunks function drops any content before the
first match because bounds is built from starts (which begins at the first
match); change bounds to include the document start so leading content is
preserved (e.g., make bounds start with 0 like bounds = [0, *starts,
len(content)]), then continue using the same loop and calls to size_chunks
(preserving the offset=start logic) so the initial segment prior to the first
pattern match is emitted as a chunk.

In
`@src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md`:
- Around line 48-156: Update the docs to use the real evaluation APIs and trace
schema: replace calls to evaluate_module with evaluate_program or
evaluate_program_from_dataset (from fleet_rlm.quality.dspy_evaluation), swap the
LLMJudgeScorer example to show how to use build_rlm_scorers or the provided
reasoning_quality_scorer (from fleet_rlm.quality.scorers) instead of a
non-existent LLMJudgeScorer, and change the TraceAwareScorer example to read
prediction.trace via trace.search_spans() and inspect span fields like
span.name, span.inputs and span.span_type (or span.type) rather than
trace.steps/step.* so the example matches current trace usage.

In
`@src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md`:
- Around line 9-25: The fenced code block showing the directory tree lacks a
language specifier; update the opening fence (the triple backticks that precede
"/home/daytona/memory/") to include a language token such as text or plaintext
(e.g., change "```" to "```text") so the block is declared and satisfies MD040;
keep the rest of the directory listing (the tree content) unchanged.

In `@src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md`:
- Around line 23-41: The fenced code block containing the directory tree in
SKILL.md is missing a language specifier and triggers MD040; update the opening
triple backticks that start the tree block to include a language token (e.g.,
```text or ```plaintext) so the block becomes a plain text fenced code block,
leaving the rest of the tree content unchanged.

---

Nitpick comments:
In `@plan/README.md`:
- Line 21: Update the phrasing "closed phase claims" to the hyphenated compound
adjective "closed-phase claims" in the README text (look for the exact phrase
"closed phase claims") so the compound modifier is grammatically correct.

In `@src/fleet_rlm/runtime/modules/skill_selection.py`:
- Around line 196-208: If the LLM selection succeeds but _parse_skill_names
returns an empty list, treat that the same as an exception by falling back to
the keyword candidates: after calling self.select(...) and computing selected =
self._parse_skill_names(...), check if selected is empty and if so set selected
= candidates[: self._max_skills] and reasoning = "" (and optionally log a
warning) so the module degrades gracefully when the LLM returns no valid skills.

In `@src/fleet_rlm/scaffold/skills/delegation/SKILL.md`:
- Line 107: Add a short rationale sentence after the line forbidding
parallel_semantic_map that explains why it's prohibited and what to use instead;
specifically mention the symbol parallel_semantic_map and state the reason
(e.g., it causes nondeterministic/unsafe parallel state updates or conflicts
with Daytona orchestration) and recommend the approved alternative (e.g., use
serial_semantic_map or the provided task-queue/delegation API). Keep it one
concise sentence or clause so readers immediately understand the constraint and
the replacement.

In `@src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md`:
- Around line 19-21: The fenced code block containing the WebSocket endpoint
`/api/v1/ws/execution` lacks a language specifier; update that fenced block in
contract-surfaces.md by adding a language like ```text (or change it to an
indented code block) so the endpoint is rendered clearly—locate the block
showing `/api/v1/ws/execution` and add the language specifier or convert to an
indented block.
- Around line 27-29: The fenced code block containing the WebSocket endpoint
"/api/v1/ws/execution/events" is missing a language specifier; update the block
to include a language (e.g., use ```text before the line) or convert it to an
indented code block so the endpoint is rendered clearly and not treated as an
unspecified code fence.

In `@src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py`:
- Around line 29-33: The test test_check_daytona_reports_missing_required_env
calls diagnose.check_daytona which currently invokes
subprocess.run(["daytona","version"],...) and thus shells out; modify the test
to mock subprocess.run (and simulate raising subprocess.TimeoutExpired or
returning a completed process as appropriate) so the test is hermetic and
doesn't call the real daytona binary—patch subprocess.run in the test (or use
monkeypatch to replace diagnose.subprocess.run) and assert check_daytona() is
False while ensuring the mock matches the behavior expected inside
diagnose.check_daytona (including handling TimeoutExpired).

In `@src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md`:
- Line 11: Add a language specifier to the fenced code blocks containing ASCII
diagrams so markdown lint stops warning: update the two blocks referenced (the
ASCII art diagram around lines 11-17 and the other at 58-74 in the file) to use
```text instead of plain ```; ensure both diagram blocks (the "Dataset → Split
(train / dev)" block and the "Initial prompt ↓" block) are updated consistently.

In `@src/fleet_rlm/scaffold/skills/rlm/references/architecture.md`:
- Around line 3-20: Add a one-line note above the module map table in
architecture.md clarifying that all listed paths are relative to src/fleet_rlm/
(e.g., "Paths are relative to src/fleet_rlm/"); update the header area near the
table that lists runtime/agent/runtime.py, runtime/modules/escalating.py, etc.,
so readers understand the base path without changing the listed entries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 66ac6d82-a1ef-4818-8413-a4b3f5669f5c

📥 Commits

Reviewing files that changed from the base of the PR and between b3471d9 and 3b6e3ec.

📒 Files selected for processing (64)

.env.example
docs/how-to-guides/frontend-development.md
docs/reference/adr/001-rlm-runtime-architecture.md
docs/reference/daytona-runtime-architecture.md
docs/reference/frontend-backend-integration.md
plan/PHASE-3-PROGRESS.md
plan/PHASE-4-PROGRESS.md
plan/README.md
src/fleet_rlm/integrations/daytona/volumes.py
src/fleet_rlm/runtime/agent/signatures.py
src/fleet_rlm/runtime/modules/__init__.py
src/fleet_rlm/runtime/modules/escalating.py
src/fleet_rlm/runtime/modules/skill_selection.py
src/fleet_rlm/scaffold/skills/README.md
src/fleet_rlm/scaffold/skills/__init__.py
src/fleet_rlm/scaffold/skills/daytona-runtime/SKILL.md
src/fleet_rlm/scaffold/skills/daytona-sandbox/SKILL.md
src/fleet_rlm/scaffold/skills/delegation/SKILL.md
src/fleet_rlm/scaffold/skills/delegation/references/budget-model.md
src/fleet_rlm/scaffold/skills/delegation/references/isolation-modes.md
src/fleet_rlm/scaffold/skills/diagnostics/SKILL.md
src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md
src/fleet_rlm/scaffold/skills/diagnostics/references/error-catalog.md
src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py
src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py
src/fleet_rlm/scaffold/skills/dspy-programs/SKILL.md
src/fleet_rlm/scaffold/skills/dspy-programs/references/module-registry.md
src/fleet_rlm/scaffold/skills/dspy-programs/references/signature-catalog.md
src/fleet_rlm/scaffold/skills/dspy-signature/SKILL.md
src/fleet_rlm/scaffold/skills/long-context/SKILL.md
src/fleet_rlm/scaffold/skills/long-context/references/chunking-strategies.md
src/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.py
src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py
src/fleet_rlm/scaffold/skills/long-context/tests/test_rank_chunks.py
src/fleet_rlm/scaffold/skills/long-context/tests/test_semantic_chunk.py
src/fleet_rlm/scaffold/skills/optimization/SKILL.md
src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md
src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md
src/fleet_rlm/scaffold/skills/rlm-batch/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-debug/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-debug/scripts/diagnose.py
src/fleet_rlm/scaffold/skills/rlm-execute/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-long-context/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-long-context/references/advanced-techniques.md
src/fleet_rlm/scaffold/skills/rlm-long-context/references/codebase-processing.md
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/cache_manager.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/codebase_concat.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/defaults.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/orchestrate.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/semantic_chunk.py
src/fleet_rlm/scaffold/skills/rlm-memory/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-run/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-test-suite/SKILL.md
src/fleet_rlm/scaffold/skills/rlm/SKILL.md
src/fleet_rlm/scaffold/skills/rlm/references/api-reference.md
src/fleet_rlm/scaffold/skills/rlm/references/architecture.md
src/fleet_rlm/scaffold/skills/sandbox-execution/SKILL.md
src/fleet_rlm/scaffold/skills/sandbox-execution/references/env-config.md
src/fleet_rlm/scaffold/skills/sandbox-execution/references/volume-layout.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/references/memory-db-schema.md
tests/unit/integrations/test_daytona_runtime.py
tests/unit/runtime/test_skill_selection.py

💤 Files with no reviewable changes (19)

src/fleet_rlm/scaffold/skills/rlm-debug/SKILL.md
src/fleet_rlm/scaffold/skills/daytona-sandbox/SKILL.md
src/fleet_rlm/scaffold/skills/daytona-runtime/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-run/SKILL.md
src/fleet_rlm/scaffold/skills/rlm/references/api-reference.md
src/fleet_rlm/scaffold/skills/rlm-long-context/references/advanced-techniques.md
src/fleet_rlm/scaffold/skills/rlm-batch/SKILL.md
src/fleet_rlm/scaffold/skills/dspy-signature/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-test-suite/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-execute/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-debug/scripts/diagnose.py
src/fleet_rlm/scaffold/skills/rlm-long-context/references/codebase-processing.md
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/semantic_chunk.py
src/fleet_rlm/scaffold/skills/rlm-memory/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/defaults.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/codebase_concat.py
src/fleet_rlm/scaffold/skills/rlm-long-context/SKILL.md
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/cache_manager.py
src/fleet_rlm/scaffold/skills/rlm-long-context/scripts/orchestrate.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/fleet_rlm/api/routers/ws/session.py (1)

234-245: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't drop read compatibility for legacy manifest locations yet.

After this change, reconnect only attempts sessions/<session_id>/conversation.json. Older Daytona volumes that still store the manifest in the legacy location will now restore as an empty session, which effectively discards prior conversation state until something rewrites the new file. Please keep a read fallback or migrate the legacy file into the new location before removing the old path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/api/routers/ws/session.py` around lines 234 - 245, The new
logic drops compatibility with legacy manifest locations causing restores to be
empty; update the restore path in the reconnect branch so that after attempting
_restore_manifest_from_local_store(persistence=persistence, sess_id=sess_id) you
also try the legacy location and/or migrate it into the new location before
returning: locate the reconnect code that uses interpreter, manifest_path,
load_manifest_from_volume, and _restore_manifest_from_local_store and implement
a fallback that checks the legacy manifest path (for the old Daytona volume
layout) and if found either load it and set manifest or atomically move/copy it
into the new sessions/<session_id>/conversation.json location so older volumes
retain conversation history.

src/fleet_rlm/api/dependencies.py (1)

213-217: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the app.state.server_state compatibility path until every initializer is migrated.

get_server_state*() still accepts legacy app.state.server_state, but _require_dep() now ignores it and raises instead. That makes callers in the same module disagree on whether startup completed, and it breaks any request/websocket path that only seeds the legacy container.

💡 Minimal compatibility patch

 def _require_dep(app: Any, attr: str) -> Any:
-    candidate = getattr(getattr(app, "state", None), attr, None)
+    state = getattr(app, "state", None)
+    candidate = getattr(state, attr, None)
     if candidate is not None:
         return candidate
+    server_state = getattr(state, "server_state", None)
+    if isinstance(server_state, ServerState):
+        legacy_candidate = getattr(server_state, attr, None)
+        if legacy_candidate is not None:
+            return legacy_candidate
     raise RuntimeError(f"Server dependency '{attr}' is not initialized. Ensure FastAPI lifespan startup has completed.")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fleet_rlm/api/dependencies.py` around lines 213 - 217, The _require_dep
function currently only checks getattr(app.state, attr) and throws if missing,
but callers (get_server_state* functions) still accept the legacy
app.state.server_state; update _require_dep to also look for the legacy
container by checking getattr(app.state, "server_state", None) and, if present,
try to read the desired attribute from that legacy object before raising.
Specifically, in _require_dep use candidate = getattr(getattr(app, "state",
None), attr, None) and if candidate is None, check legacy = getattr(getattr(app,
"state", None), "server_state", None) and return getattr(legacy, attr, None) if
found, otherwise raise the same RuntimeError to preserve behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference/daytona-architecture.md`:
- Around line 175-176: Update the Persistent Memory Model description to match
the new manifest location: replace the reference to "session manifests and
workspace provenance belong under `meta/workspaces/...`" with the new durable
path `sessions/<session_id>/conversation.json` (or state that workspace
provenance remains under `meta/workspaces/...` if only session manifests moved),
and remove any mention of the legacy `meta/workspaces/.../react-session-*.json`
fallback; ensure the text consistently uses
`sessions/<session_id>/conversation.json` where session manifest is discussed so
the lines about session manifest location (previously around "Session manifests
on durable storage live under `sessions/<session_id>/conversation.json`") and
the Persistent Memory Model do not conflict.

In `@src/fleet_rlm/api/runtime_services/sandboxes.py`:
- Around line 168-175: The current _list_labels_filter (and the duplicate logic
around lines 354-375) excludes sandboxes that lack SANDBOX_OWNER_LABEL,
stranding unlabeled legacy sandboxes; change the logic to preserve a controlled
legacy path by returning a filter that matches either the requested owner label
OR the absence of the owner label (i.e., include unlabeled sandboxes) when
SANDBOX_OWNER_LABEL is missing in owner_labels, or alternatively trigger a
migration/relabel step before enforcing owner-only filtering; update
_list_labels_filter and the same duplicated block to implement the “include
unlabeled (legacy)” branch or call a relabel/migration helper so old sandboxes
remain visible to detail/delete/archive operations.

---

Outside diff comments:
In `@src/fleet_rlm/api/dependencies.py`:
- Around line 213-217: The _require_dep function currently only checks
getattr(app.state, attr) and throws if missing, but callers (get_server_state*
functions) still accept the legacy app.state.server_state; update _require_dep
to also look for the legacy container by checking getattr(app.state,
"server_state", None) and, if present, try to read the desired attribute from
that legacy object before raising. Specifically, in _require_dep use candidate =
getattr(getattr(app, "state", None), attr, None) and if candidate is None, check
legacy = getattr(getattr(app, "state", None), "server_state", None) and return
getattr(legacy, attr, None) if found, otherwise raise the same RuntimeError to
preserve behavior.

In `@src/fleet_rlm/api/routers/ws/session.py`:
- Around line 234-245: The new logic drops compatibility with legacy manifest
locations causing restores to be empty; update the restore path in the reconnect
branch so that after attempting
_restore_manifest_from_local_store(persistence=persistence, sess_id=sess_id) you
also try the legacy location and/or migrate it into the new location before
returning: locate the reconnect code that uses interpreter, manifest_path,
load_manifest_from_volume, and _restore_manifest_from_local_store and implement
a fallback that checks the legacy manifest path (for the old Daytona volume
layout) and if found either load it and set manifest or atomically move/copy it
into the new sessions/<session_id>/conversation.json location so older volumes
retain conversation history.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 06c69f32-9341-4781-b8ce-5388a9285e2a

📥 Commits

Reviewing files that changed from the base of the PR and between 3b6e3ec and b20b5b3.

⛔ Files ignored due to path filters (1)

src/frontend/src/lib/rlm-api/generated/openapi.ts is excluded by !**/generated/**

📒 Files selected for processing (115)

.codex/agents/sandbox-runtime.toml
.codex/config.toml
.env.example
.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-agent-files.json
.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-tool-files.json
.factory/validation/runtime-api-cleanup/scrutiny/reviews/runtime-consolidate-execution.json
.factory/validation/runtime-api-cleanup/scrutiny/synthesis.json
.ignore
CHANGELOG.md
CONTRIBUTING.md
README.md
docs/SUMMARY.md
docs/how-to-guides/codex-environment.md
docs/how-to-guides/deploying-server.md
docs/how-to-guides/developer-setup.md
docs/how-to-guides/dspy-integration.md
docs/how-to-guides/installation.md
docs/how-to-guides/releasing.md
docs/reference/auth.md
docs/reference/database.md
docs/reference/daytona-architecture.md
docs/reference/frontend-feature-spec.md
docs/reference/index.md
docs/reference/python-api.md
docs/reference/release-notes/0.4.94.md
docs/reference/release-notes/0.4.99.md
docs/reference/release-notes/0.5.0.md
docs/reference/sandbox-api.md
docs/tutorials/01-basic-usage.md
openapi.yaml
output/longcot-eval/README.md
output/longcot-eval/final/comparison_report.md
output/longcot-eval/final/comparison_summary.md
output/longcot-eval/final/direct_100_tasks.jsonl
output/longcot-eval/final/direct_eval.json
output/longcot-eval/final/rlm_100_tasks.jsonl
output/longcot-eval/final/rlm_eval.json
output/longcot-eval/longcot_gepa_dataset.jsonl
output/longcot-eval/partial-runs/chess-math/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_095814.jsonl
output/longcot-eval/partial-runs/cs-chem/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_095754.jsonl
output/longcot-eval/partial-runs/longcot_all_longcot-mini_20260429_162551.jsonl
output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_165416.jsonl
output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171028.jsonl
output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171619.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_072531.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_081152.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_114407.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_120033.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_143848.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_153040.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_162545.jsonl
output/longcot-eval/partial-runs/longcot_or_deepseek_v4_flash_all_longcot-mini_20260430_170108.jsonl
output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_165305.jsonl
output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_204150.jsonl
output/longcot-eval/pilot/direct_20_tasks.jsonl
output/longcot-eval/pilot/rlm_20_tasks.jsonl
plan/README.md
scripts/check_docs_quality.py
src/fleet_rlm/api/config.py
src/fleet_rlm/api/dependencies.py
src/fleet_rlm/api/routers/sandboxes.py
src/fleet_rlm/api/routers/ws/session.py
src/fleet_rlm/api/routers/ws/stream.py
src/fleet_rlm/api/runtime_services/diagnostics.py
src/fleet_rlm/api/runtime_services/sandbox_service.py
src/fleet_rlm/api/runtime_services/sandboxes.py
src/fleet_rlm/api/runtime_services/session_paths.py
src/fleet_rlm/api/schemas/optimization.py
src/fleet_rlm/cli/runners.py
src/fleet_rlm/integrations/config/config_full.yaml
src/fleet_rlm/integrations/config/env.py
src/fleet_rlm/integrations/config/runtime_settings.py
src/fleet_rlm/integrations/daytona/bridge.py
src/fleet_rlm/integrations/daytona/config.py
src/fleet_rlm/integrations/daytona/sandbox_executor.py
src/fleet_rlm/integrations/local_store.py
src/fleet_rlm/integrations/observability/mlflow_runtime.py
src/fleet_rlm/quality/datasets.py
src/fleet_rlm/runtime/agent/runtime.py
src/fleet_rlm/runtime/modules/skill_selection.py
src/fleet_rlm/runtime/schemas.py
src/fleet_rlm/scaffold/skills/delegation/SKILL.md
src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md
src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py
src/fleet_rlm/scaffold/skills/diagnostics/tests/test_diagnose.py
src/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.py
src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py
src/fleet_rlm/scaffold/skills/long-context/tests/test_rank_chunks.py
src/fleet_rlm/scaffold/skills/long-context/tests/test_semantic_chunk.py
src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md
src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md
src/fleet_rlm/scaffold/skills/rlm/references/architecture.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md
src/frontend/.env.example
src/frontend/openapi/fleet-rlm.openapi.yaml
src/frontend/src/features/workspace/workbench/__tests__/run-workbench.test.tsx
src/frontend/src/lib/auth/__tests__/token-store.test.ts
src/frontend/src/lib/data/mock/runtime.ts
src/frontend/src/lib/rlm-api/__tests__/backend-contract.test.ts
src/frontend/src/lib/rlm-api/__tests__/config.test.ts
src/frontend/src/lib/rlm-api/__tests__/ws-frame-parser.test.ts
src/frontend/src/router.tsx
tests/unit/api/test_chat_persistence.py
tests/unit/api/test_dependencies.py
tests/unit/api/test_sandboxes.py
tests/unit/api/test_session_paths.py
tests/unit/cli/test_mlflow_cli.py
tests/unit/integrations/test_daytona_config.py
tests/unit/integrations/test_env_config.py
tests/unit/integrations/test_memory_db.py
tests/unit/integrations/test_observability.py
tests/unit/runtime/test_execution.py
tests/unit/runtime/test_knowledge_tools.py
tests/unit/runtime/test_skill_selection.py

💤 Files with no reviewable changes (21)

output/longcot-eval/final/comparison_summary.md
output/longcot-eval/README.md
output/longcot-eval/pilot/rlm_20_tasks.jsonl
.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-tool-files.json
output/longcot-eval/final/comparison_report.md
output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_171619.jsonl
.factory/validation/dead-code-removal/scrutiny/reviews/delete-legacy-agent-files.json
output/longcot-eval/final/direct_eval.json
output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_204150.jsonl
output/longcot-eval/partial-runs/longcot_bedrock_nemotron_120b_all_longcot-mini_20260429_165416.jsonl
tests/unit/api/test_sandboxes.py
output/longcot-eval/final/rlm_eval.json
tests/unit/api/test_dependencies.py
src/fleet_rlm/api/runtime_services/diagnostics.py
output/longcot-eval/partial-runs/longcot_rlm_all_longcot-mini_20260430_165305.jsonl
src/fleet_rlm/integrations/daytona/config.py
src/fleet_rlm/api/runtime_services/session_paths.py
src/fleet_rlm/api/runtime_services/sandbox_service.py
src/fleet_rlm/integrations/daytona/bridge.py
tests/unit/integrations/test_daytona_config.py
src/fleet_rlm/api/config.py

✅ Files skipped from review due to trivial changes (44)

src/frontend/src/router.tsx
docs/reference/release-notes/0.4.99.md
docs/how-to-guides/deploying-server.md
src/frontend/src/lib/rlm-api/tests/ws-frame-parser.test.ts
docs/reference/release-notes/0.4.94.md
docs/SUMMARY.md
.ignore
src/frontend/.env.example
docs/reference/auth.md
src/frontend/src/features/workspace/workbench/tests/run-workbench.test.tsx
CONTRIBUTING.md
.factory/validation/runtime-api-cleanup/scrutiny/reviews/runtime-consolidate-execution.json
docs/tutorials/01-basic-usage.md
docs/reference/frontend-feature-spec.md
docs/reference/python-api.md
docs/reference/index.md
docs/how-to-guides/developer-setup.md
.factory/validation/runtime-api-cleanup/scrutiny/synthesis.json
src/fleet_rlm/runtime/schemas.py
docs/reference/sandbox-api.md
tests/unit/integrations/test_observability.py
src/fleet_rlm/api/routers/ws/stream.py
docs/how-to-guides/dspy-integration.md
src/frontend/src/lib/rlm-api/tests/config.test.ts
src/fleet_rlm/cli/runners.py
src/fleet_rlm/runtime/agent/runtime.py
src/fleet_rlm/scaffold/skills/rlm/references/architecture.md
docs/reference/database.md
src/fleet_rlm/api/schemas/optimization.py
docs/how-to-guides/installation.md
src/fleet_rlm/quality/datasets.py
src/fleet_rlm/scaffold/skills/volume-bootstrap/SKILL.md
src/fleet_rlm/scaffold/skills/diagnostics/references/contract-surfaces.md
tests/unit/runtime/test_execution.py
docs/reference/release-notes/0.5.0.md
src/fleet_rlm/scaffold/skills/optimization/references/gepa-pipeline.md
src/fleet_rlm/integrations/local_store.py
src/frontend/openapi/fleet-rlm.openapi.yaml
openapi.yaml
src/fleet_rlm/integrations/config/runtime_settings.py
src/fleet_rlm/integrations/observability/mlflow_runtime.py
src/fleet_rlm/scaffold/skills/optimization/references/evaluation-patterns.md
plan/README.md
CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (4)

src/fleet_rlm/scaffold/skills/delegation/SKILL.md
src/fleet_rlm/scaffold/skills/volume-bootstrap/references/filesystem-contract.md
src/fleet_rlm/scaffold/skills/long-context/scripts/semantic_chunk.py
src/fleet_rlm/scaffold/skills/long-context/scripts/rank_chunks.py

coderabbitai · 2026-05-30T05:24:08Z

+- Session manifests on durable storage live under `sessions/<session_id>/conversation.json`.
+- Manifest readers use the current session conversation path only.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Manifest path contradicts the Persistent Memory Model section.

Lines 175–176 establish that session manifests now live at sessions/<session_id>/conversation.json, but Line 200 still states that "session manifests and workspace provenance belong under meta/workspaces/...". Since the PR removes the legacy meta/workspaces/.../react-session-*.json fallback, Line 200 should be reconciled to avoid documenting two conflicting durable manifest locations.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/daytona-architecture.md` around lines 175 - 176, Update the Persistent Memory Model description to match the new manifest location: replace the reference to "session manifests and workspace provenance belong under `meta/workspaces/...`" with the new durable path `sessions/<session_id>/conversation.json` (or state that workspace provenance remains under `meta/workspaces/...` if only session manifests moved), and remove any mention of the legacy `meta/workspaces/.../react-session-*.json` fallback; ensure the text consistently uses `sessions/<session_id>/conversation.json` where session manifest is discussed so the lines about session manifest location (previously around "Session manifests on durable storage live under `sessions/<session_id>/conversation.json`") and the Persistent Memory Model do not conflict.

coderabbitai · 2026-05-30T05:24:08Z

 def _list_labels_filter(
    *,
    owner_labels: dict[str, str] | None,
-    allow_unlabeled_legacy: bool,
 ) -> dict[str, str] | None:
-    if allow_unlabeled_legacy or not owner_labels:
+    if not owner_labels:
        return None
    owner_value = owner_labels.get(SANDBOX_OWNER_LABEL)
    return {SANDBOX_OWNER_LABEL: owner_value} if owner_value else None


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This strands legacy unlabeled sandboxes.

Returning False whenever the owner label is missing means pre-label sandboxes now 404 on detail/delete/archive, and load_sandbox_list() silently returns an empty page if a caller forgets to pass owner_labels. That removes the only cleanup path for older resources. Please keep a controlled legacy path or add a relabel/migration step before making owner labels mandatory.

Also applies to: 354-375

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/runtime_services/sandboxes.py` around lines 168 - 175, The current _list_labels_filter (and the duplicate logic around lines 354-375) excludes sandboxes that lack SANDBOX_OWNER_LABEL, stranding unlabeled legacy sandboxes; change the logic to preserve a controlled legacy path by returning a filter that matches either the requested owner label OR the absence of the owner label (i.e., include unlabeled sandboxes) when SANDBOX_OWNER_LABEL is missing in owner_labels, or alternatively trigger a migration/relabel step before enforcing owner-only filtering; update _list_labels_filter and the same duplicated block to implement the “include unlabeled (legacy)” branch or call a relabel/migration helper so old sandboxes remain visible to detail/delete/archive operations.

Refine scaffold skills for plugin eval

3b6e3ec

github-advanced-security AI found potential problems May 30, 2026

View reviewed changes

Comment thread src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py Fixed

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

Fix ty checks in scaffold skill helpers

99cc0e1

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Zochory added 4 commits May 30, 2026 06:22

Fix agent harness docs and validation guidance

d5702ab

codex: fix skill runtime review feedback

5e3edb7

codex: redact diagnostics env output

675db54

codex: avoid printing secret status values

c13c21a

github-advanced-security AI found potential problems May 30, 2026

View reviewed changes

Comment thread src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py Fixed

Zochory added 2 commits May 30, 2026 06:40

codex: keep env checks out of status output

b806856

codex: simplify diagnostics status output

be71b6c

github-advanced-security AI found potential problems May 30, 2026

View reviewed changes

Comment thread src/fleet_rlm/scaffold/skills/diagnostics/scripts/diagnose.py Fixed

Zochory added 2 commits May 30, 2026 06:46

codex: remove generic diagnostics status sink

db479bc

codex: address remaining skill review threads

b20b5b3

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Zochory merged commit 7f02373 into main May 30, 2026
16 checks passed

coderabbitai Bot mentioned this pull request Jun 6, 2026

Refactor RLM runtime, event streaming, and frontend chat adapters #271

Open

		- Session manifests on durable storage live under `sessions/<session_id>/conversation.json`.
		- Manifest readers use the current session conversation path only.

Uh oh!

Conversation

Zochory commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

chatgpt-codex-connector Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zochory commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading