Refactor RLM runtime, event streaming, and frontend chat adapters#271
Conversation
Add browser-aware document/runtime tooling, Daytona broker and sandbox slot recovery, richer MLflow trace context, and frontend rendering for trajectory/tool state. Cover the review follow-ups around SSRF guarding, broker cooldown retry, browser skill selection, scoped trajectory de-duplication, and safe sandbox reconciliation.
feat(phase7): align RLM recursion with reference implementation and tighten persistence contracts - Update daytona to 0.184.0 and fastapi to 0.136.3 - Add Phase 7 documentation covering history management, bounded child snapshots, token-aware compaction, explicit depth tracking, and fallback behavior - Tighten PersistenceDep type from Any to PersistenceProtocol and raise RuntimeError when backend not initialized - Remove ServerState compatibility proxy attributes (_SERVER_STATE_PROXY_ATTRS) - Move RuntimeEventContext from api/events/
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Warning Review limit reached
More reviews will be available in 25 minutes and 12 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (26)
📝 WalkthroughWalkthroughRefactors websocket streaming into modular services, introduces a typed RuntimeEvent schema and projections, overhauls lifecycle/persistence, adds Daytona browser tooling and slot reconciliation, extends MCP integration, updates frontend adapters/UI, and broadens tests and docs. Minor dependency/version and scripts updates. ChangesWebSocket runtime, RLM, tools, and frontend integration
Sequence Diagram(s)sequenceDiagram
participant Client
participant WS_Endpoint
participant ConnLoop
participant TurnRunner
participant AgentRuntime
participant Lifecycle
participant RepoDB
Client->>WS_Endpoint: Connect / send message
WS_Endpoint->>ConnLoop: _chat_message_loop(...)
ConnLoop->>TurnRunner: run_streaming_turn(...)
TurnRunner->>Lifecycle: initialize_turn_lifecycle()
TurnRunner->>AgentRuntime: aiter_chat_turn_stream(...)
AgentRuntime-->>TurnRunner: RuntimeEvent (tool/reasoning/status)
TurnRunner->>Lifecycle: emit_step(event)
Lifecycle->>RepoDB: append_step (async batch)
TurnRunner-->>Client: WS frame (project_chat)
AgentRuntime-->>TurnRunner: RuntimeEvent DONE/ERROR
TurnRunner->>Lifecycle: complete_run(status, summary)
Lifecycle->>RepoDB: update_run_status
TurnRunner-->>Client: execution_completed
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
|
There was a problem hiding this comment.
Code Review
This pull request introduces a Phase 0 safety net with golden-payload capture and regression verification, integrates DSPy-native MCP tools, adds browser automation via Playwright, refactors the WebSocket connection loop, and aligns RLM recursion and history management with the reference implementation. The review feedback highlights several critical bugs and improvements: a baseline deletion issue in the capture script, a missing _interpreter assignment in EscalatingFleetModule, TypeError risks when calling scorer methods in mlflow_cli.py, potential synchronous/asynchronous mismatches in session_persistence.py, robustness issues in the browser fetch sandbox code, and unhandled orphan <think> tags in mlflow_runtime.py.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/frontend/src/lib/workspace/__tests__/backend-chat-event-adapter.test.ts (1)
1-1302:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix frontend formatting before merge (
vp checkis failing in CI).Please run
vp check --fix(or the repo’s formatting lane) and commit the result so the frontend check passes.As per coding guidelines, frontend TS/TSX changes must go through the formatting/lint workflow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/src/lib/workspace/__tests__/backend-chat-event-adapter.test.ts` around lines 1 - 1302, Formatting check failed for frontend TypeScript files; run the repo formatter and commit the fixes. Run the project's frontend formatting/lint pipeline (e.g., vp check --fix or the repo's formatting lane), reformat the changed test file(s) that reference applyWsFrameToMessages and related tests, then commit the updated files so the vp check passes in CI.Sources: Coding guidelines, Pipeline failures
src/fleet_rlm/api/runtime_services/chat_persistence.py (1)
131-149:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDisallow persisting while the run is intentionally left alive.
If
cancel_active_run=Falsebutpersist_on_disconnectstaysTrue, this path leavesstream_taskrunning and then immediately callslocal_persist(... release_idle_session=True). That can snapshot stale session state and release the Daytona session underneath the background execution. Make this combination impossible, or forcepersist_on_disconnect=Falsewhenever the run is allowed to continue.Suggested guard
async def handle_chat_disconnect( *, pending_receive_task: asyncio.Task[object] | None, stream_task: asyncio.Task[str | None] | None, cancel_flag: dict[str, bool], local_persist: Callable[..., Awaitable[None]], lifecycle: ExecutionLifecycleManager | None, cancel_active_run: bool = True, persist_on_disconnect: bool = True, ) -> None: """Cleanly stop the active websocket loop after a client disconnect.""" + if not cancel_active_run and persist_on_disconnect: + raise ValueError( + "persist_on_disconnect must be False when background execution is left running" + ) if cancel_active_run: cancel_flag["cancelled"] = True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/runtime_services/chat_persistence.py` around lines 131 - 149, When cancel_active_run is False we must not persist because stream_task continues running; update the logic in the shutdown function (references: cancel_active_run, persist_on_disconnect, stream_task, local_persist, cancel_task, _log_background_disconnect_task_result) to prevent calling local_persist when a run is intentionally left alive — either by forcing persist_on_disconnect = False before the early/else branch where stream_task is left running or by adding a guard that returns/sets persist_on_disconnect=False whenever cancel_active_run is False, ensuring local_persist(...) with release_idle_session=True is never invoked while stream_task may still be active.src/fleet_rlm/runtime/agent/runtime.py (1)
503-508:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftKeep the public stream on a single event type.
aiter_chat_turn_stream()still promisesAsyncIterator[StreamEvent], but both the post-hoc path and the native tool/clarification path now yieldRuntimeEvent. That already breaks type-checking in CI, and it also means callers can receive a mixed event object shape within one stream depending on which branch executes. Convert everything at this boundary to one event class before yielding, or update the signature and all downstream consumers in the same change.Also applies to: 805-805, 822-931
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/agent/runtime.py` around lines 503 - 508, The post-hoc path in _aiter_chat_turn_stream_posthoc is yielding RuntimeEvent objects while aiter_chat_turn_stream promises AsyncIterator[StreamEvent], which mixes event shapes; modify _aiter_chat_turn_stream_posthoc (and the other post-hoc/native branches noted around the other ranges) to convert/normalize every RuntimeEvent into the public StreamEvent shape before yielding (map fields, types, and any nested payloads to StreamEvent), keeping the aiter_chat_turn_stream signature unchanged; alternatively, if you prefer to expose RuntimeEvent, update aiter_chat_turn_stream's return type and all downstream consumers in the same change—pick one approach and apply consistently across _aiter_chat_turn_stream_posthoc, the native tool/clarification branch, and the other affected ranges so the stream always emits a single event class.Source: Pipeline failures
🟠 Major comments (24)
src/fleet_rlm/integrations/observability/mlflow_runtime.py-515-523 (1)
515-523:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOrphan
</think>cleanup is currently unreachable in callback flow.
_strip_think_tags()now handles orphan closings, but the callback only invokes it when a full<think>...</think>pair is detected. Orphan-only</think>payloads still bypass sanitization.Proposed fix
- if isinstance(content, str) and _THINK_TAG_RE.search(content): + if isinstance(content, str) and ( + _THINK_TAG_RE.search(content) or _ORPHAN_THINK_CLOSE_RE.search(content) + ): message["content"] = _strip_think_tags(content) @@ - if isinstance(text, str) and _THINK_TAG_RE.search(text): + if isinstance(text, str) and ( + _THINK_TAG_RE.search(text) or _ORPHAN_THINK_CLOSE_RE.search(text) + ): choice["text"] = _strip_think_tags(text)Also applies to: 555-560
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/observability/mlflow_runtime.py` around lines 515 - 523, The callback path currently only strips think tags when a full <think>...</think> pair is found, so orphan-only payloads like "</think>" bypass sanitization; update the callback(s) around the existing tag-handling logic (the branches referenced around the regions that call _THINK_TAG_RE processing at ~515 and ~555) to invoke _strip_think_tags(text) before any further handling (or at minimum when _ORPHAN_THINK_CLOSE_RE.search(text) is true), ensuring orphan closing tags are removed; keep using the existing _strip_think_tags, _THINK_TAG_RE and _ORPHAN_THINK_CLOSE_RE symbols so the single sanitization routine is the canonical cleanup for all branches.src/fleet_rlm/integrations/observability/mlflow_context.py-155-169 (1)
155-169:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
_coerce_trajectory_stepstyping so CI typecheck passes.Line 157 and Line 166 currently trigger mypy/pyright overload errors from
dict(step, index=step.get("index", index)). Build normalized step dicts first, then setindexexplicitly.Proposed fix
+def _normalize_trajectory_step(step: dict[Any, Any], index: int) -> dict[str, Any]: + normalized: dict[str, Any] = {str(key): value for key, value in step.items()} + if normalized.get("index") is None: + normalized["index"] = index + return normalized + + def _coerce_trajectory_steps(raw: Any) -> list[dict[str, Any]]: if isinstance(raw, list): - return [dict(step, index=step.get("index", index)) for index, step in enumerate(raw) if isinstance(step, dict)] + steps: list[dict[str, Any]] = [] + for index, step in enumerate(raw): + if isinstance(step, dict): + steps.append(_normalize_trajectory_step(step, index)) + return steps if not isinstance(raw, dict): return [] for key in ("trajectory", "steps"): nested = raw.get(key) if isinstance(nested, list): - return [ - dict(step, index=step.get("index", index)) - for index, step in enumerate(nested) - if isinstance(step, dict) - ] + steps: list[dict[str, Any]] = [] + for index, step in enumerate(nested): + if isinstance(step, dict): + steps.append(_normalize_trajectory_step(step, index)) + return steps🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/observability/mlflow_context.py` around lines 155 - 169, In _coerce_trajectory_steps, mypy/pyright complain about using dict(step, index=step.get("index", index)); instead, first ensure each raw step is a dict by calling dict(step) or copy into a new variable (e.g., normalized = dict(step)), then set normalized["index"] = normalized.get("index", index") or index, and return those normalized dicts; apply this change in both branches where dict(step, index=...) is used so the function returns list[dict[str, Any]] without type errors.Source: Pipeline failures
src/fleet_rlm/api/routers/ws/connection_loop.py-411-420 (1)
411-420:⚠️ Potential issue | 🟠 Major | ⚡ Quick winA received message can be dropped in the streaming branch.
When
_handle_message_while_streaming(...)returnsFalse, the unconditionalcontinuestill discards that message instead of deferring it to the idle path.Proposed fix
- if await _handle_message_while_streaming( + handled = await _handle_message_while_streaming( websocket=self.websocket, msg=msg, agent=self.agent, runtime=self.runtime, session=self.session, local_persist=self.local_persist, - ): + ) + if handled: continue + self.pending_message = msg continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/connection_loop.py` around lines 411 - 420, The streaming branch currently discards messages because it calls _handle_message_while_streaming(...) and then always executes an unconditional continue; change the control flow so you only continue the outer loop when _handle_message_while_streaming(...) returns True (keep the existing continue inside the True branch) and let execution fall through to the idle handling path when it returns False so the same msg is processed by the idle logic; locate the call to _handle_message_while_streaming in connection_loop.py and remove or guard the extra unconditional continue so that only the True result causes the loop to continue.scripts/capture_phase0_baseline.sh-23-30 (1)
23-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBaseline artifacts are deleted before script completion.
Line 23 removes
tests/contracts/golden_payloadsafter baselines are copied, soopenapi_baseline.yamlandopenapi_client_baseline.tsare not actually preserved even though Lines 28-30 report success.Proposed fix
# Run golden payload capture tests echo "Running golden payload capture tests..." # Temporarily remove golden payloads directory to trigger capture rm -rf tests/contracts/golden_payloads uv run pytest tests/contracts/test_golden_payloads.py::test_capture_chat_websocket_golden_payloads -v uv run pytest tests/contracts/test_golden_payloads.py::test_capture_passive_events_websocket_golden_payloads -v + +# Recreate directory and persist OpenAPI/client baselines after recapture +mkdir -p tests/contracts/golden_payloads +cp openapi.yaml tests/contracts/golden_payloads/openapi_baseline.yaml +cp src/frontend/src/lib/rlm-api/generated/openapi.ts tests/contracts/golden_payloads/openapi_client_baseline.ts🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/capture_phase0_baseline.sh` around lines 23 - 30, The script currently executes "rm -rf tests/contracts/golden_payloads" after the baseline capture steps, which deletes the artifacts that the later echo lines claim were saved; remove or relocate that command so baselines are preserved: either delete the "rm -rf tests/contracts/golden_payloads" line entirely or move it to run before the pytest commands (so it clears old artifacts first), ensuring the pytest invocations (uv run pytest ... test_capture_*_golden_payloads) produce and retain openapi_baseline.yaml and openapi_client_baseline.ts referenced by the echo messages.scripts/verify_phase0_regression.sh-16-26 (1)
16-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRegression diffs don’t fail the script.
When baselines drift, the script only warns and exits successfully. That makes this lane non-blocking even on real contract regressions.
Proposed fix
+failed=0 + # Compare openapi.yaml echo "Comparing openapi.yaml against baseline..." if ! diff -u tests/contracts/golden_payloads/openapi_baseline.yaml openapi.yaml; then echo "WARNING: openapi.yaml has changed from baseline" echo "Review the diff above. If changes are intentional, update the baseline." + failed=1 fi # Compare generated client echo "Comparing generated client against baseline..." if ! diff -u tests/contracts/golden_payloads/openapi_client_baseline.ts src/frontend/src/lib/rlm-api/generated/openapi.ts; then echo "WARNING: Generated client has changed from baseline" echo "Review the diff above. If changes are intentional, update the baseline." + failed=1 fi +if [ "$failed" -ne 0 ]; then + exit 1 +fi + echo "=== Phase 0 regression verification complete ==="🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/verify_phase0_regression.sh` around lines 16 - 26, The script currently only warns when generated artifacts differ (the two diff checks in verify_phase0_regression.sh) but does not fail the run; update both diff failure branches that start with "if ! diff -u tests/contracts/golden_payloads/openapi_baseline.yaml ..." and "if ! diff -u tests/contracts/golden_payloads/openapi_client_baseline.ts ..." to cause a non-zero exit (e.g., add an exit 1 in each then-branch or set a failing flag and exit non-zero after checks) so the script fails CI on regression diffs.src/fleet_rlm/api/routers/ws/stream_summary.py-66-78 (1)
66-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNamespace the runtime failure metadata consistently.
These four fields are emitted as
runtime_degraded/runtime_failure_*, while the rest of this helper writesfleet_rlm.*. That splits one trace schema in two, so degraded/failure filters built against thefleet_rlm.namespace will miss these terminal cases.Suggested fix
for key in ( "runtime_degraded", "runtime_failure_category", "runtime_failure_phase", "runtime_fallback_used", ): value = payload.get(key, runtime.get(key)) if value in (None, "", False): if key in {"runtime_degraded", "runtime_fallback_used"} and value is False: - metadata[key] = False + metadata[f"fleet_rlm.{key}"] = False continue - metadata[key] = value + metadata[f"fleet_rlm.{key}"] = value🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/stream_summary.py` around lines 66 - 78, The runtime failure keys ("runtime_degraded", "runtime_failure_category", "runtime_failure_phase", "runtime_fallback_used") must be stored under the fleet_rlm namespace so traces remain consistent: when reading a value check both the namespaced and legacy keys (e.g. payload.get("fleet_rlm.runtime_degraded") or payload.get("runtime_degraded") then fallback to runtime.get(...)), and when writing to metadata set the namespaced key (metadata["fleet_rlm.runtime_degraded"], etc.). Preserve the special-case handling for False on runtime_degraded and runtime_fallback_used but assign the boolean to the namespaced key. Update the loop that iterates those keys and uses payload, runtime and metadata accordingly.src/fleet_rlm/api/routers/ws/stream_summary.py-247-273 (1)
247-273:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBackfill payload
sourcesandattachmentson therun_resultpath.When
run_resultis present, this branch merges warnings andfinal_artifactbut never copies top-levelpayload["sources"]orpayload["attachments"]. The fallback branch does, so the completion shape changes depending on whether the runtime returnedrun_result, and citations/artifacts disappear on the common path.Suggested fix
normalized.setdefault("duration_ms", summary_payload.get("duration_ms")) normalized.setdefault("warnings", warnings) + normalized.setdefault("sources", list(payload.get("sources") or [])) + normalized.setdefault("attachments", list(payload.get("attachments") or [])) nested_summary = _as_record(normalized.get("summary"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/stream_summary.py` around lines 247 - 273, The run_result branch never preserves top-level payload["sources"] and payload["attachments"], causing citations/artifacts to vanish when runtime returns run_result; in the block handling run_result (where normalized is created and final_artifact is set) add code to backfill sources and attachments into normalized (e.g., normalized.setdefault("sources", payload.get("sources")) and normalized.setdefault("attachments", payload.get("attachments"))) so existing run_result values are preserved but payload sources/attachments are present when missing; keep this logic alongside the existing setting of "final_artifact" and "warnings".src/fleet_rlm/api/routers/ws/turn_runner.py-59-66 (1)
59-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep persisted run status aligned with the terminal summary.
A
doneevent that requires human review is normalized tostatus="needs_human_review"inbuild_execution_completion_summary(), but_terminal_run_status()still returnsRunStatus.COMPLETED. That gives the same run two terminal truths depending on whether consumers read the DB status or the completion payload.Also applies to: 95-99
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/turn_runner.py` around lines 59 - 66, _terminal_run_status currently maps a terminal "done" event to COMPLETED/FAILED but doesn't account for the "needs_human_review" normalization done by build_execution_completion_summary; update _terminal_run_status to inspect the final payload (same way build_execution_completion_summary does) and return RunStatus.NEEDS_HUMAN_REVIEW when the payload indicates human review is required (e.g. payload.get("status") == "needs_human_review" or via the same helper used to detect that state), and make the same change in the other terminal-status mapping at the later block referenced (the similar logic around lines 95-99) so DB run status and completion summary remain consistent.scripts/validate_rlm_e2e_trace.py-128-129 (1)
128-129:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not treat
execution_completedas a conversational terminal event.
execution_completedis the passive execution-stream terminal signal; accepting it as the chat terminal (Line 128 / Line 401) can let this harness pass even when/api/v1/ws/executionnever emits afinalevent.Suggested fix
- if payload.get("type") == "execution_completed": - return events, payload - if payload.get("type") != "event": continue kind = payload.get("data", {}).get("kind") if kind in {"final", "error", "cancelled"}: return events, payload @@ - terminal_kind = terminal_chat_payload.get("data", {}).get("kind") or terminal_chat_payload.get("type") - if terminal_kind not in {"final", "execution_completed"}: + terminal_kind = terminal_chat_payload.get("data", {}).get("kind") or terminal_chat_payload.get("type") + if terminal_kind != "final": raise RuntimeError( - f"Terminal chat event kind is {terminal_kind!r}; expected 'final' or " - "'execution_completed'." + f"Terminal chat event kind is {terminal_kind!r}; expected 'final'." )Based on learnings:
/api/v1/ws/executionis conversational, while/api/v1/ws/execution/eventsis subscription-only/passive.Also applies to: 400-405
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/validate_rlm_e2e_trace.py` around lines 128 - 129, Do not treat payload.get("type") == "execution_completed" as a conversational terminal; update the event handling in the code that currently returns on payload.get("type") == "execution_completed" so it only returns/terminates on the conversational "final" event (e.g., payload.get("type") == "final") and keep "execution_completed" as a passive/subscription-only signal (handle but do not use to short-circuit/accept the harness). Locate references to payload.get("type") == "execution_completed" (including the occurrence around the current check and the similar block later) and replace the return condition to check for "final" instead, leaving any logging/collection of "execution_completed" intact. Ensure the behavior for /api/v1/ws/execution (conversational) relies on "final" while subscription/event-stream handling still observes "execution_completed" without treating it as terminal.Sources: Coding guidelines, Learnings
src/fleet_rlm/api/events/project_chat.py-23-29 (1)
23-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMap status/progress runtime events to
execution_started, not genericexecution_step.This mapping currently downgrades non-terminal events to
execution_step, but the frontend adapter only runs sandbox-progress projection and status-tone inference onexecution_started. As a result, sandbox progress/status events can lose their canonical rendering path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/events/project_chat.py` around lines 23 - 29, The _frame_kind function currently maps non-terminal RuntimeEventKind values to "execution_step", which downgrades status/progress events; update _frame_kind to map status/progress runtime events to "execution_started" instead of "execution_step". Concretely, add (or use) a set like _STATUS_PROGRESS_KINDS containing the appropriate RuntimeEventKind members (e.g., status/progress kinds) and change the logic in _frame_kind to return "execution_started" when event_kind is in _TURN_STARTED_KINDS OR in _STATUS_PROGRESS_KINDS, keep returning "execution_completed" for _TERMINAL_KINDS, and only return "execution_step" for true intermediate steps not in those two sets.src/frontend/src/lib/workspace/backend-chat-event-adapter.ts-671-673 (1)
671-673:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
resolve_hitlfailure detection is too narrow.Only treating
"error"as failure will misclassify statuses like"failed","rejected", or"cancelled"as success and incorrectly apply resolution.Suggested fix
- const succeeded = - asOptionalText(result?.status)?.toLowerCase() !== "error"; + const status = asOptionalText(result?.status)?.toLowerCase(); + const succeeded = + status == null || + !["error", "failed", "failure", "rejected", "cancelled"].includes(status);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/src/lib/workspace/backend-chat-event-adapter.ts` around lines 671 - 673, The current detection for resolve_hitl failures only treats result?.status === "error" as a failure; update the succeeded computation in backend-chat-event-adapter.ts (the succeeded variable that uses asOptionalText(result?.status)) to normalize (trim().toLowerCase()) and treat a broader set of failure statuses (e.g., "error","failed","rejected","cancelled") as failures—or alternatively only treat an explicit success set (e.g., "ok","success","succeeded") as success—and use that check when applying resolve_hitl logic so statuses like "failed" or "rejected" are not misclassified as successful.tests/contracts/test_golden_payloads.py-54-57 (1)
54-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
skipifpredicates are inverted for capture vs regression modes.Right now capture tests are skipped when no baseline exists, while regression tests run and fail on missing golden files. Flip the predicates so capture runs only when baseline is absent, and regression runs only when baseline exists.
💡 Proposed fix
`@pytest.mark.skipif`( - not GOLDEN_PAYLOADS_DIR.exists(), - reason="Golden payloads directory exists - run once to capture baseline", + GOLDEN_PAYLOADS_DIR.exists(), + reason="Golden payloads already captured - skip baseline capture", ) @@ `@pytest.mark.skipif`( - not GOLDEN_PAYLOADS_DIR.exists(), - reason="Golden payloads directory exists - run once to capture baseline", + GOLDEN_PAYLOADS_DIR.exists(), + reason="Golden payloads already captured - skip baseline capture", ) @@ `@pytest.mark.skipif`( - GOLDEN_PAYLOADS_DIR.exists(), - reason="Golden payloads already captured - use for regression testing", + not GOLDEN_PAYLOADS_DIR.exists(), + reason="Golden payloads missing - run baseline capture first", ) @@ `@pytest.mark.skipif`( - GOLDEN_PAYLOADS_DIR.exists(), - reason="Golden payloads already captured - use for regression testing", + not GOLDEN_PAYLOADS_DIR.exists(), + reason="Golden payloads missing - run baseline capture first", )Also applies to: 88-91, 126-129, 158-161
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/contracts/test_golden_payloads.py` around lines 54 - 57, The skipif predicates are inverted: for capture tests flip the condition to skip when GOLDEN_PAYLOADS_DIR.exists() (so capture runs only when baseline is absent) and update the reason text accordingly, and for regression tests flip their condition to skip when not GOLDEN_PAYLOADS_DIR.exists() (so regression runs only when baseline exists); update each pytest.mark.skipif(...) instance in this file (the three other occurrences shown) to use the corrected boolean and matching reason string.src/fleet_rlm/api/runtime_services/run_lifecycle.py-214-224 (1)
214-224:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid potential deadlock when stopping the persistence worker.
Line 218 can block indefinitely if the worker already exited and the queue is full (no consumer left), which can hang
complete_run().💡 Proposed fix
async def _stop_persist_worker(self) -> None: if self._persist_worker_task is None: return - if self._persist_queue is not None: - await self._persist_queue.put(None) + task = self._persist_worker_task + queue = self._persist_queue + if queue is not None and not task.done(): + try: + queue.put_nowait(None) + except asyncio.QueueFull: + task.cancel() try: - await self._persist_worker_task + await task except asyncio.CancelledError: pass self._persist_worker_task = None self._persist_queue = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/runtime_services/run_lifecycle.py` around lines 214 - 224, The stop routine _stop_persist_worker can deadlock on "await self._persist_queue.put(None)" if the worker has already exited and the queue is full; change the code to do a non-blocking enqueue of the sentinel (use self._persist_queue.put_nowait(None)) and catch asyncio.QueueFull (or Exception) to skip sending the sentinel when the queue is full, then proceed to await the worker task and clear _persist_worker_task and _persist_queue as before.src/fleet_rlm/api/runtime_services/session_persistence.py-157-173 (1)
157-173:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe local-store fallback is a silent no-op for the UUID-based backend shape described here.
This helper only writes when
update_chat_sessionexposesexternal_session_id. Theelsebranch explicitly skips thetenant_id/session_idsignature called out in the comment forLocalStore/FleetRepository, so the "durable fallback" never persists anything for those implementations. Please thread the canonical session identifiers into this helper or add a persistence API that can update by external session id instead of silently dropping the write.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/runtime_services/session_persistence.py` around lines 157 - 173, The helper currently only calls update_chat_session when it accepts external_session_id, leaving implementations like LocalStore.update_chat_session / FleetRepository (which expect tenant_id and session_id UUIDs) as a silent no-op; change the helper to accept or derive the canonical tenant_id and session_id and call update_fn(tenant_id=..., session_id=..., metadata_json={...}) in the else branch (or add a persistence API on the repository that accepts external_session_id and use that). Locate the code using update_fn (the inspect.signature block) and either thread the tenant_id/session_id into this helper from the caller or add/consume a new persistence method that maps external_session_id => tenant_id/session_id, then call update_fn with the UUID params and metadata_json to ensure the durable fallback actually persists.src/fleet_rlm/api/routers/ws/stream_events.py-135-137 (1)
135-137:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid unconditionally awaiting
preparehooks.
prepareis typed asCallable[[], Any], but it is always awaited. A synchronous hook will raiseTypeErrorand terminate streaming.Suggested fix
+import inspect @@ - if request.prepare is not None: - await request.prepare() + if request.prepare is not None: + prepared = request.prepare() + if inspect.isawaitable(prepared): + await prepared🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/stream_events.py` around lines 135 - 137, The code unconditionally awaits request.prepare() which fails for synchronous hooks; change the call in stream handling (around request.prepare and the async for using request.agent.aiter_chat_turn_stream and _build_agent_stream_kwargs) to call request.prepare(), capture its return value, and only await it if it's awaitable (use inspect.isawaitable or asyncio.iscoroutine to detect); ensure synchronous prepare hooks are invoked without awaiting and preserve the existing behavior for async prepare hooks.src/fleet_rlm/api/routers/ws/stream_events.py-123-124 (1)
123-124:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard payload normalization to mappings only.
dict(getattr(event, "payload", {}) or {})can throw for string/list/object payloads, which converts malformed event data into a stream crash.Suggested fix
+from collections.abc import Mapping @@ - return WorkspaceEvent( + raw_payload = getattr(event, "payload", None) + payload = dict(raw_payload) if isinstance(raw_payload, Mapping) else {} + + return WorkspaceEvent( kind=kind, text=str(getattr(event, "text", "") or ""), - payload=dict(getattr(event, "payload", {}) or {}), + payload=payload, timestamp=timestamp, terminal=is_terminal_stream_event_kind(kind), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/api/routers/ws/stream_events.py` around lines 123 - 124, The current payload normalization line in stream_events.py uses dict(getattr(event, "payload", {}) or {}), which will raise when payload is a non-mapping (string/list/etc.); change it to guard for mapping types: fetch payload = getattr(event, "payload", None), if isinstance(payload, Mapping) then use dict(payload) else use {}. Import Mapping from collections.abc and replace the direct dict(...) call with this guarded logic in the code path that constructs the event payload (the variable/argument named payload in the event creation).src/fleet_rlm/integrations/daytona/runtime.py-96-109 (1)
96-109:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBrowser skill snapshot selection can degrade to a non-browser image on cold environments
Line 96 introduces browser snapshot selection, but when that snapshot is missing/inactive,
resolve_sandbox_spec_snapshot()falls back throughfallback_to_declarative_image()which currently builds only the base image. That drops Playwright/Chromium deps and can make browser skill runs fail despite selecting browser capabilities.💡 Proposed fix
diff --git a/src/fleet_rlm/integrations/daytona/snapshots.py b/src/fleet_rlm/integrations/daytona/snapshots.py @@ def fallback_to_declarative_image(spec: SandboxSpec) -> SandboxSpec: """Replace a snapshot-based spec with a declarative image build.""" - image = build_base_snapshot_image() + if spec.snapshot == BROWSER_SNAPSHOT_NAME: + image = build_browser_snapshot_image() + else: + image = build_base_snapshot_image() return dataclasses.replace(spec, image=image, snapshot=None)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/daytona/runtime.py` around lines 96 - 109, resolve_snapshot_for_skills can pick the browser snapshot but downstream fallback_to_declarative_image currently only builds the base image, dropping Playwright/Chromium; update the fallback path so the selected snapshot flavor is preserved: modify fallback_to_declarative_image to accept a snapshot_name or a browser_flag and, when called from resolve_sandbox_spec_snapshot with a browser-selected name (BROWSER_SNAPSHOT_NAME returned by resolve_snapshot_for_skills), build the declarative fallback that includes browser deps (the same layers/packages as the browser snapshot) instead of always using DEFAULT_SNAPSHOT_NAME; update resolve_sandbox_spec_snapshot to pass the chosen snapshot (or browser flag) into fallback_to_declarative_image so browser-capable runs still get Playwright/Chromium when the original snapshot is missing.src/fleet_rlm/runtime/tools/document_tools.py-361-367 (1)
361-367:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnforce an aggregate bundle budget.
Each downloaded document still gets its own 50 MiB cap, so a single root fetch can now concatenate roughly 150 MiB of text into one tool response. That's enough to overwhelm the sandbox and any downstream LLM context. Stop bundling once a small combined text budget is reached, or skip auxiliary documents when the primary document is already large.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/tools/document_tools.py` around lines 361 - 367, The bundling currently concatenates all combined_parts into text without an overall size cap, so update the bundling logic that builds text, metadata, bundled_sources and bundled_source_count to enforce a global combined budget (e.g., a small byte/char limit) by iterating combined_parts and adding them until adding the next part would exceed the budget; skip or truncate auxiliary parts (auxiliary_errors sources) when the primary document already uses most of the budget, and ensure bundled_sources and bundled_source_count only reflect the actually included parts and that metadata["auxiliary_errors"] is adjusted to reflect skipped/truncated items; operate on the same symbols (combined_parts, bundled_sources, auxiliary_errors, metadata) and keep the per-document cap but stop further concatenation once the aggregate budget is reached.src/fleet_rlm/runtime/tools/document_tools.py-175-193 (1)
175-193:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAuxiliary bundling is not actually limited to root URLs.
The docstring says this is for root documentation URLs, but the current check only excludes paths with a suffix. URLs like
https://docs.example/apiorhttps://docs.example/guide/still qualify and will silently pull in/llms.txtand/sitemap.xml, changing a page fetch into a site-wide bundle. Gate this on/or a stricter root heuristic before adding auxiliary sources.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/tools/document_tools.py` around lines 175 - 193, The function _documentation_auxiliary_urls currently treats non-file paths like "/api" or "/guide/" as roots and will add site-level companions; change the gating logic to only allow auxiliary bundling for true root pages by normalizing and checking the path equals "/" (or empty) before constructing candidates. Specifically, replace the current heuristic that only tests Path(path).suffix with a stricter check on parsed.path (e.g., normalize with parsed.path or parsed.path.rstrip("/") and require it to be "/" or empty) before using origin and _AUXILIARY_DOC_PATHS to build candidates (keep variables current, candidates, origin, and _AUXILIARY_DOC_PATHS) and update the docstring to reflect the stricter root-only behavior.src/fleet_rlm/integrations/daytona/bridge_callbacks.py-42-46 (1)
42-46:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
semantic_callbacks_enabledis bypassable from sandbox code.Hiding
llm_query*frombridge_tools()does not actually disable them: code running inside the sandbox can still POSTtool_name="llm_query"to the local broker, andinvoke_tool()will execute it because the dispatcher never checks this flag. If this flag is meant to be a policy boundary, enforce it ininvoke_tool()or whitelist only the names returned bybridge_tools().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/daytona/bridge_callbacks.py` around lines 42 - 46, The current check of semantic_callbacks_enabled in bridge_callbacks.py only omits adding "llm_query" and "llm_query_batched" to the tools map but does not prevent sandbox code from calling them; update the runtime enforcement by making invoke_tool() validate requested tool names against the allowed toolset returned by bridge_tools() (or an explicit allowed_tools list created in bridge_callbacks.py), and if semantic_callbacks_enabled is False reject or raise on any invocation of "llm_query" / "llm_query_batched" (or any tool not present in the allowed tools). Concretely, ensure invoke_tool() consults the tools dict produced by bridge_tools() (and the semantic_callbacks_enabled flag or an allowed_tools whitelist) before executing interpreter.llm_query or interpreter.llm_query_batched to enforce the policy boundary.src/fleet_rlm/integrations/daytona/bridge.py-143-158 (1)
143-158:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the broker and wrapper timeouts in sync.
/tool_callnow waitsbroker_tool_call_timeout, but the generated wrapper still hard-times out its HTTP POST at 130 seconds later in this file. With the default 180-second broker timeout, callbacks that take 130-180 seconds will fail client-side first and surface as transport errors even though the broker is still waiting. Use the same timeout in both places or clamp the broker timeout to the wrapper limit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/daytona/bridge.py` around lines 143 - 158, The wrapper currently uses a hardcoded __DAYTONA_TOOL_CALL_TIMEOUT_S__ for waiting on call_id results, which can be shorter than the broker's broker_tool_call_timeout and cause client-side timeouts; update the loop in bridge.py (the code that waits on call_id, touches _results, _pending_requests, and calls _send_json) to use the same broker timeout value (broker_tool_call_timeout) or clamp broker_tool_call_timeout to __DAYTONA_TOOL_CALL_TIMEOUT_S__ so both sides share the same timeout; locate where __DAYTONA_TOOL_CALL_TIMEOUT_S__ is referenced and replace/derive it from broker_tool_call_timeout (or apply min(broker_tool_call_timeout, __DAYTONA_TOOL_CALL_TIMEOUT_S__)) so callbacks that take up to the broker timeout do not fail client-side.src/fleet_rlm/integrations/daytona/sandbox_executor.py-848-867 (1)
848-867:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe 5-minute broker latch survives explicit resets.
_owner_broker_start_error()checks_BROKER_START_FAILURESbefore the owner field, so after one broker start failure the next run still short-circuits here for the full cooldown even ifsoft_reset()orclose_bridge()was called. That makes the new reset-time_bridge_start_error = Noneineffective for same-sandbox recovery. Clear the session-wide cache on explicit reset/close, or scope the cache to the executor instance instead of the sandbox id.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/integrations/daytona/sandbox_executor.py` around lines 848 - 867, The session-wide broker failure cache (_BROKER_START_FAILURES) causes _owner_broker_start_error(session, ...) to short-circuit new runs even after owner._bridge_start_error is cleared; update the reset/close flow (e.g., in soft_reset() and close_bridge()) to also clear the session entry from _BROKER_START_FAILURES by calling _clear_broker_start_error(session) (or remove the session key), or alternatively refactor the cache to be scoped to the executor instance rather than the sandbox id so that setting owner._bridge_start_error = None actually permits immediate recovery; ensure changes reference the existing symbols _BROKER_START_FAILURES, _owner_broker_start_error, _clear_broker_start_error, soft_reset, close_bridge, and owner._bridge_start_error.src/fleet_rlm/runtime/agent/runtime.py-125-138 (1)
125-138:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't unwrap the escalating runtime into its inner ReAct here.
When
self.agentis the defaultEscalatingFleetModule, this probe can return its inner.reactprogram and the native path then drives that child directly with onlychat_history/user_message. That skips the outer module’s routing/escalation flow and drops the inputs built by_escalation_call_args(...)(execution_mode,conversation_summary,core_memory, boundedhistory), so URL-document RLM routing and other escalation behavior can be silently bypassed during streaming. Keep native streaming restricted to a trueFleetAgent, or make the escalating module itself implement the stepwise interface instead of unwrapping it.Also applies to: 822-823, 847-850
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/agent/runtime.py` around lines 125 - 138, The probe in _get_streamable_react_program incorrectly unwraps an EscalatingFleetModule by returning its .react child; change the logic so streaming is only allowed for true FleetAgent instances (or objects that implement the full stepwise streaming interface) and refuse to return the inner .react when the outer program is an EscalatingFleetModule (or when the original program exposes escalation-specific APIs like _escalation_call_args / routing). Concretely, update _get_streamable_react_program to detect EscalatingFleetModule (or presence of escalation attributes) and skip returning candidate if the outer program is an escalation module; ensure the same check is applied in the other similar probes referenced around lines 822-823 and 847-850 so native streaming cannot bypass escalation inputs.src/fleet_rlm/runtime/modules/escalating.py-276-278 (1)
276-278:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRLM fallback object is incompatible with the
_run_rlmcall contractLine 277 sets
self._rlmtoChainOfThought(RLMReActChatSignature), but Lines 579–595 invoke that object withtask/promptkwargs. In forced RLM mode without an interpreter, this will raise at runtime instead of cleanly degrading.Suggested fix
else: - self._rlm = dspy.ChainOfThought(RLMReActChatSignature) + self._rlm = NoneAlso applies to: 579-595
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/modules/escalating.py` around lines 276 - 278, The fallback assigned to self._rlm (ChainOfThought(RLMReActChatSignature)) doesn't accept the keyword args used by _run_rlm (it is invoked with task= / prompt= in the 579–595 call site), so replace the direct ChainOfThought assignment with an adapter that matches the _run_rlm call contract: either construct a wrapper callable (or small adapter class with __call__(self, *, task=None, prompt=None, **kwargs)) that forwards task/prompt into the ChainOfThought invocation in the expected positional form, or change the _run_rlm invocation to use the ChainOfThought-compatible signature; ensure the symbol names referenced are self._rlm, ChainOfThought, RLMReActChatSignature and _run_rlm so the adapter preserves the original behavior while accepting task/prompt kwargs.
🧹 Nitpick comments (4)
tests/unit/api/test_dependencies.py (1)
95-113: ⚡ Quick winAdd a regression test for the uninitialized persistence branch.
get_persistencenow raises when bothrepositoryandlocal_storeare absent, but that new branch isn’t asserted here.Suggested test addition
+def test_get_persistence_raises_when_backend_uninitialized(clean_runtime_env): + dependencies_module = importlib.import_module("fleet_rlm.api.dependencies") + app = SimpleNamespace(state=SimpleNamespace(persistence_deps=dependencies_module.PersistenceDeps())) + request = _build_request(app) + try: + dependencies_module.get_persistence(request) + assert False, "Expected RuntimeError when persistence backend is uninitialized" + except RuntimeError as exc: + assert "Persistence backend not initialized" in str(exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/api/test_dependencies.py` around lines 95 - 113, Extend the existing test_get_persistence_prefers_repository_then_local_store to cover the new uninitialized branch: create an app whose state.persistence_deps is dependencies_module.PersistenceDeps() with neither repository nor local_store, build the request with _build_request(app), and assert that calling dependencies_module.get_persistence(request) raises the expected exception (use pytest.raises to check the raise). Ensure you reference get_persistence and PersistenceDeps when locating the code under test.src/fleet_rlm/runtime/modules/skill_selection.py (1)
68-99: ⚡ Quick winReduce overlap between
long-contextandbrowser-interactionkeywords.
http://,https://,fetch,browse, andscrapeinlong-contextcan overshadow browser-intent routing, makingbrowser-interactionless likely to be selected when it should be.Suggested direction
"long-context": [ @@ - "http://", - "https://", - "fetch", - "browse", - "scrape", - "analyze https", - "analyze http", + "analyze https", + "analyze http", ], "browser-interaction": [ + "http://", + "https://", + "fetch", + "browse", + "scrape", "render the page", "rendered page", "javascript page",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fleet_rlm/runtime/modules/skill_selection.py` around lines 68 - 99, The keyword lists in skill_selection.py have overlapping tokens causing browser-intent to be underselected; remove the URL- and browser-specific tokens ('http://', 'https://', 'fetch', 'browse', 'scrape') from the "long-context" list and ensure they remain (or are added) only in the "browser-interaction" list so routing prefers the browser skill; update the "long-context" and "browser-interaction" arrays accordingly in the file (referencing the "long-context" and "browser-interaction" keys) and run any selection/unit tests to verify correct intent routing.src/frontend/src/components/agent-elements/input/model-picker.tsx (2)
13-13: ⚡ Quick winUse the canonical
cn()import path.Line [13] uses a relative
cnimport; this should come from the shared alias to keep frontend utilities consistent.Proposed change
-import { cn } from "../utils/cn"; +import { cn } from "`@/lib/utils`";As per coding guidelines, use
@/lib/utilsas the canonicalcn()import path for frontend TypeScript files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/src/components/agent-elements/input/model-picker.tsx` at line 13, The import for cn in model-picker.tsx uses a relative path ("../utils/cn"); update it to use the canonical shared alias by importing cn from "`@/lib/utils`" instead, ensuring the component continues to use cn() for className concatenation and keeping frontend utility imports consistent across the codebase.Source: Coding guidelines
65-65: ⚡ Quick winReplace arbitrary Tailwind pixel classes with shared utilities/tokens.
Lines [65], [84], [101], and [131] use bracket pixel values (e.g.,
text-[12px],rounded-[6px],rounded-[8px]). These should be converted to canonical typography/radius utilities (or added via shared@utilitytokens) to match the design-system contract.As per coding guidelines, arbitrary Tailwind values should be eliminated in favor of semantic/shared utilities.
Also applies to: 84-84, 101-101, 131-131
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/frontend/src/components/agent-elements/input/model-picker.tsx` at line 65, Replace the arbitrary Tailwind pixel classes in the model-picker.tsx file with canonical typography and radius utilities. Specifically, convert text-[12px] to the appropriate shared typography utility token, rounded-[6px] to the corresponding canonical radius utility, and rounded-[8px] at line 131 to its shared equivalent. Apply these replacements across all four occurrences mentioned at lines 65, 84, 101, and 131 to ensure consistency with the design system contract and eliminate arbitrary bracket notation values in favor of semantic shared utilities defined via `@utility` tokens.Source: Coding guidelines
Summary
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Improvements