Skip to content

feat(ai): AI Engine — provider-agnostic LLM gateway + 8 connectors#354

Open
thewrz wants to merge 34 commits into
mainfrom
epic/ai-engine
Open

feat(ai): AI Engine — provider-agnostic LLM gateway + 8 connectors#354
thewrz wants to merge 34 commits into
mainfrom
epic/ai-engine

Conversation

@thewrz

@thewrz thewrz commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Integration PR for the AI Engine Back-end Redesign epic. Replaces the single hardcoded Anthropic recommendation path with a provider-agnostic, per-DJ LLM gateway.

What lands

Provider notes

  • xAI / OpenRouter subclass the OpenAI-compatible adapter (fixed base URLs).
  • Gemini uses the native function_declarations tool schema.
  • Azure OpenAI uses per-deployment URLs + api-key header (with URL-injection hardening).
  • Bedrock signs SigV4 manually over httpx (no boto3), per-family tool translation (Anthropic / Llama).

Closes #329 #330 #331 #332 #334

Test plan

  • Full GitHub CI green (this PR finally triggers ci.yml — epic sub-PRs did not)
  • Local: backend ruff/format/bandit/pytest (~87% cov), frontend lint/tsc/vitest, alembic single-head + no drift
  • Manual: connect + test each connector type from /settings/ai; run a recommendation through a non-Anthropic connector
  • Admin: toggle llm_apikey_connectors_enabled, confirm gating

Summary by CodeRabbit

Release Notes

  • New Features

    • Added LLM Gateway system supporting multiple AI providers (OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, OpenRouter, xAI) with per-DJ connector management.
    • Introduced per-feature AI model preferences for recommendations and set builder.
    • Added admin oversight dashboard with LLM policy controls, usage reporting, and audit trails.
    • Implemented monthly token caps to control per-DJ AI usage.
    • Added background health monitoring for AI connectors with DJ notifications.
    • Moved DJ AI provider settings from /settings/ai to account page.
  • Bug Fixes

    • Fixed SSE connection pool leak preventing proper resource cleanup.
  • Documentation

    • Updated configuration guides and added LLM plugin adapter documentation.

Verification status (added on review)

  • OpenAI (openai_apikey) — verified live end-to-end on the testing server: connector test green, recommendation generated via gpt-5.4-mini, confirmed in llm_call_log.
  • Gemini / xAI / OpenRouter / Bedrock — implemented + full unit matrix (happy/401/429/5xx/timeout/malformed; Bedrock SigV4 checked against AWS's published test vector), reviewed, merged. Not yet exercised against live provider endpoints, but their acceptance criteria are met at the unit/integration level.
  • Azure OpenAI ([AI Engine] Add Azure OpenAI provider adapter #333) — implemented + unit-tested, but its acceptance criterion is "works against a real Azure OpenAI test deployment", which has not been performed. [AI Engine] Add Azure OpenAI provider adapter #333 intentionally left OUT of the Closes list until live Azure verification is done.

Follow-ups (not blocking this PR)

thewrz and others added 7 commits May 25, 2026 08:33
Sub-PRs target the long-lived epic/ai-engine branch, not main.
Final epic->main merge happens once after end-to-end validation.
…on migration (#348)

* feat(ai): provider-agnostic LLM gateway + 3 connectors + recommendation migration

Implements the MVP of the AI Engine back-end redesign per
docs/superpowers/specs/2026-05-24-admin-ai-oauth-design.md.

Ships:
- services/llm/ gateway with per-DJ MRU connector resolution + org default fallback
- Adapters: openai_apikey (httpx), openai_compatible (Hermes/Ollama/etc), anthropic_apikey (SDK)
- Models: LlmConnector (encrypted credentials), LlmCallLog (counts-only telemetry),
  LlmAuditEvent (credential lifecycle audit trail)
- Alembic migration 046 — creates 3 tables, adds 3 system_settings columns, data-migrates
  ANTHROPIC_API_KEY env var into an "anthropic_apikey" connector owned by the first admin
- REST API: /api/llm/connectors (per-DJ CRUD + test) + /api/admin/llm/* (policy, force-revoke, usage)
- DJ UI: /settings/ai (connect/test/delete + Hermes ChatGPT-subscription onboarding link)
- Admin UI: /admin/ai gains connector policy + per-DJ table + usage rollup
- Recommendation engine routes through the gateway (actor=event.created_by, purpose="recommendation");
  legacy direct-Anthropic path retained as fallback for callers without db/actor

Security:
- All credentials encrypted at rest via EncryptedText (Fernet)
- Custom OpenAI-compatible URLs validated (HTTPS any host, HTTP loopback + RFC1918 only,
  no embedded creds, no query/fragment)
- Sanitised error responses — no provider error bodies, stack traces, or bearer values leak
- Audit events for every credential lifecycle event (create, rotate, delete, admin-revoke, auth-invalid)
- Rate limited: create 5/min, test 10/min, rotate credentials 5/min
- No new dependencies (httpx + anthropic already in tree, no openai SDK, no litellm)

Tests: 104 new backend tests (gateway, adapters, tool translation, URL validation, API CRUD,
recommendation regression), frontend tests for both new pages, +4 api client tests.
Total backend coverage: 86.95% (above 85% threshold).

Closes #329

* fix(llm): address CodeRabbit review on PR #348

Backend correctness:
- llm_audit_event.target_connector_id FK now ondelete=SET NULL (migration +
  model) so audited connectors can be deleted without FK violations
- gateway MRU resolver uses nulls_last() so never-used connectors sort after
  recently-used ones on PostgreSQL
- tool_translation rejects Anthropic tool_use blocks missing id/name instead
  of emitting "None" tool calls
- _httpx_openai flattens dict-style content blocks (was dropping their text)
- connector_storage.update_metadata enforces the same 80-char model_hint bound
  as create
- llm.py update path maps unique-constraint rename collisions to 409 (was 500)
- admin_llm PATCH treats explicit null llm_default_connector_id as clear
- registry.register_adapter raises on conflicting duplicate registration
- is_llm_available(db, actor) mirrors gateway resolver semantics; callers in
  events.py pass the event owner as actor
- schemas enforce per-type credential requirements (ConnectorCreate) and a
  non-empty rotation payload (ConnectorCredentialsRotate); openapi.json
  regenerated

Frontend:
- DJ AI settings: block "Add provider" when policy disables all types; render
  create failures with error styling instead of success
- admin AI page: Promise.allSettled so one failing gateway request no longer
  blanks the others
- api-types derives LlmConnectorType/Status from generated schema

Docs + tests:
- CLAUDE.md / .env.example clarify ANTHROPIC_API_KEY deprecation + org-default
  connector semantics
- add regression tests for tool_use validation and URL validator IPv6/edge cases

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an xAI Grok connector type (xai_apikey) to the LLM gateway. xAI
exposes an OpenAI-compatible /chat/completions surface, so the adapter
subclasses OpenAICompatibleAdapter with a fixed base_url
(https://api.x.ai/v1) and api_key credential extraction, inheriting the
OpenAI-compatible request/response and tool-use translation logic.

- Reuses the generic llm_apikey_connectors_enabled admin flag (no
  per-provider column, no Alembic migration).
- Stores the api_key in the encrypted credentials blob.
- xAI key-format validation (xai- prefix).
- Adapter test matrix: happy path, 401, 429, 5xx, timeout, malformed
  response, tool-use via inherited path, plus connector CRUD + policy
  gating through the API.

Closes #331
* feat(ai): add OpenRouter LLM provider adapter

Add an openrouter_apikey connector type that routes to many upstream
models via a single OpenRouter API key. Subclasses OpenAICompatibleAdapter
with a fixed base_url (https://openrouter.ai/api/v1) and surfaces the stored
api_key as the Bearer token. Adds a model-hint dropdown backed by a cached
(hourly TTL) fetch of OpenRouter's public /models catalogue, exposed via a
minimal GET /api/llm/openrouter/models endpoint.

Reuses the generic llm_apikey_connectors_enabled admin flag (no per-provider
flag) and stores credentials in the encrypted creds blob — no DB columns and
no Alembic migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ai): include OpenRouter in admin connector-policy copy

The admin "Allow API-key connectors" label still listed only
"OpenAI, Anthropic" after openrouter_apikey was added as a
supported connector type. Update the copy to include OpenRouter
so the policy toggle messaging matches the available connectors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ai): add AWS Bedrock LLM provider adapter

Adds a "bedrock" connector type to the provider-agnostic LLM gateway so DJs
can route AI features to their own AWS Bedrock account (AWS-billed inference).

- New adapter (services/llm/adapters/bedrock.py) signs InvokeModel requests
  with AWS SigV4 implemented manually over the existing httpx client — no boto3
  or any new dependency (per the CLAUDE.md CVE/dependency rule).
- Dependency-free SigV4 signer (services/llm/sigv4.py), pinned against AWS's
  published signing-key test vector.
- Per-family tool translation keyed off aws_model_id: anthropic.* reuses the
  Anthropic tool schema/parser; meta.*/llama* uses prompt-embedded tools with
  JSON tool-call parsing (added to tool_translation.py).
- Config (aws_region, aws_model_id) + creds (aws_access_key_id,
  aws_secret_access_key) live in the encrypted credentials blob — NO new DB
  columns and NO Alembic migration.
- Reuses the generic llm_apikey_connectors_enabled admin flag (no per-provider
  flag). Creds rotate via the existing PUT /credentials route (partial rotation
  supported).
- Frontend /settings/ai gains a Bedrock option (gated like other api-key
  connectors) with the four AWS credential inputs.

Closes #334

* fix(llm): address CodeRabbit review on Bedrock adapter

- bedrock: pass empty set (not None) for tool_names when no tools are
  configured, so generated {"name","input"} JSON is never misclassified
  as a tool call (would clear text + flip stop_reason to tool_use)
- ai settings page: null model_hint for Bedrock so a stale hint left over
  from a prior connector-type selection is never posted
- tests: assert aws_secret_access_key round-trips on create and
  aws_model_id is preserved during partial Bedrock credential rotation
* feat(ai): add Azure OpenAI LLM provider adapter

Closes #333.

Adds an azure_openai connector type to the LLM gateway. Azure uses a
per-deployment endpoint and the api-key auth header (not Authorization:
Bearer), so the adapter builds its own URL + headers while reusing the
shared OpenAI request-body shaping and response parsing.

All Azure config (resource name, deployment name, api version) plus the
api key live in the encrypted credentials blob — no new DB columns, no
migration. Admins can rotate any subset (resource/deployment/version or
just the key) via the existing PUT /credentials route without recreating
the connector. Gated by the generic llm_apikey_connectors_enabled policy
flag (no per-provider flag).

* fix(ai): harden Azure connector input validation (CodeRabbit)

Address three CodeRabbit findings on the Azure OpenAI adapter:

- _build_azure_endpoint now validates resource/deployment/api-version
  and URL-encodes path/query components, preventing URL-component
  injection (authority/path/query rewrite) from a malformed blob.
- Pydantic create/rotate validators reject whitespace-only credential
  values via a shared _provided() helper.
- Azure rotation distinguishes None (field omitted, keep current) from
  "" (explicit empty), passing empties to _build_azure_creds so they are
  rejected instead of silently preserving the old value.

Adds regression tests for endpoint hardening, whitespace rejection, and
empty-string rotation rejection.
* feat(ai): add Gemini provider adapter

Add a native Google Gemini (generativelanguage API) connector to the LLM
gateway alongside OpenAI/Anthropic. Implements the LlmAdapter ABC with
function_declarations tool translation, header-based API-key auth, and the
full error matrix (401/403, 429, 5xx, timeout, malformed). Reuses the generic
llm_apikey_connectors_enabled admin flag — no per-provider flag, no migration.
Credentials stored in the encrypted blob. Frontend surfaces the Gemini option
when api-key connectors are enabled.

Closes #330

* fix(llm): harden Gemini adapter + drop key-shaped test literals

Address CodeRabbit review on PR #351:
- Flatten dict-based content blocks in Gemini message translation so
  list content like {"text": "..."} is no longer silently dropped.
- Normalize malformed Gemini functionCall args to ToolTranslationError
  instead of raw TypeError/ValueError from dict(...).
- Replace AIza-shaped GCP key literals in tests with non-secret
  placeholders / runtime-built valid-shape keys to stop secret scanners.
- Document gemini_apikey credential semantics in ConnectorCreate docstring
  and regenerate openapi.json.
- Make admin AI policy checkbox label generic (provider-agnostic).
- Add regression tests for list-content flattening and non-object args.

* fix(llm): guard non-object Gemini functionCall payload

CodeRabbit follow-up on PR #351: a truthy non-dict functionCall value
(e.g. a string/list) would reach fn.get("name") and raise a raw
AttributeError, bubbling as a 500. Normalize to ToolTranslationError,
matching the args-handling pattern. Regression test added.
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an LLM gateway with adapters, DB models/migrations, admin/DJ APIs, streaming tests, health monitor, quotas, and moves recommendations to use the gateway. Frontend gains API types/clients and admin/DJ UIs. OpenAPI regenerated, SSE endpoint pool-safety fixed, and docs/env updated.

Changes

LLM Gateway + Admin/DJ Management

Layer / File(s) Summary
End-to-end gateway, adapters, APIs, UI, tests
server/app/services/llm/..., server/app/api/..., server/app/models/..., dashboard/lib/..., dashboard/components/..., dashboard/app/..., server/alembic/..., server/openapi.json, server/tests/*, docs/*
Implements provider-agnostic LLM gateway and adapters, schema/migrations, admin/DJ endpoints and UIs, streaming and health monitoring, monthly caps, and routes recommendations via the gateway; updates OpenAPI, tests, and docs.

Sequence Diagram(s)

sequenceDiagram
  participant DJ Client
  participant Dashboard API
  participant Gateway
  participant Adapter
  participant Provider
  participant DB
  DJ Client->>Dashboard API: Create/Test connector / Request recommendation
  Dashboard API->>Gateway: dispatch(ChatRequest, actor, purpose)
  Gateway->>DB: resolve connector (prefs/default/MRU), enforce monthly cap
  Gateway->>Adapter: chat()/stream()
  Adapter->>Provider: Provider-specific call
  Provider-->>Adapter: Response/Stream chunks
  Adapter-->>Gateway: ChatResponse/Chunks (+usage)
  Gateway->>DB: log_call, audit, status/health updates
  Gateway-->>Dashboard API: Result/stream
  Dashboard API-->>DJ Client: UI updates/CSV/usage
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • wrzonance/WrzDJ#348: Earlier gateway/connector + recommendation migration foundation touched the same gateway, migrations, and endpoints.
  • wrzonance/WrzDJ#379: Shares the streaming gateway path and /api/llm/connectors/{id}/stream-test SSE client/server wiring.
  • wrzonance/WrzDJ#377: Implements monthly token caps that this PR enforces/admin-manages across backend and admin UI.

Poem

A rabbit taps code in moonlit byte,
Gateways hum, models take flight.
Keys kept safe, audits aglow,
Streams like rivers softly flow.
DJs pick paths, defaults align—
Caps guard costs, all is fine.
Thump-thump: shipped on rabbit time! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch epic/ai-engine

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Nitpick comments (5)
server/tests/test_llm_gateway.py (1)

194-231: ⚡ Quick win

This test currently validates system-context fallback, not admin-actor fallback.

Line 230 calls Gateway.dispatch(..., None, ...), so the admin-actor path described by the test isn’t exercised. Consider dispatching with admin_user_actor (or splitting into two tests).

🎯 Minimal adjustment
-        resp = await Gateway.dispatch(db, None, gateway_request, purpose="test")
+        resp = await Gateway.dispatch(db, admin_user_actor, gateway_request, purpose="test")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/tests/test_llm_gateway.py` around lines 194 - 231, The test
test_falls_back_to_system_default is calling Gateway.dispatch with None as the
actor, so it exercises system-default fallback rather than the admin-actor
fallback described; update the call to Gateway.dispatch to pass admin_user_actor
(or split into two tests: one that calls Gateway.dispatch(db, admin_user_actor,
gateway_request, ...) to verify admin-specific connector behavior and another
that calls Gateway.dispatch(db, None, gateway_request, ...) to verify
system-default fallback) so the admin-actor path is actually exercised.
server/tests/test_llm_tool_translation.py (1)

66-69: ⚡ Quick win

Add missing invalid-force test for Anthropic tool translation.

to_anthropic_tools has a guarded error branch for unknown forced tools, but this suite doesn’t currently assert it (unlike OpenAI/Gemini). Add a pytest.raises(ToolTranslationError) case for "does_not_exist" to prevent silent regressions.

Proposed test addition
 class TestAnthropicTools:
@@
     def test_force_tool(self):
         tools, choice = to_anthropic_tools([TOOL], "rank_recommendations")
         assert choice == {"type": "tool", "name": "rank_recommendations"}
+
+    def test_force_tool_not_in_list(self):
+        with pytest.raises(ToolTranslationError):
+            to_anthropic_tools([TOOL], "does_not_exist")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/tests/test_llm_tool_translation.py` around lines 66 - 69, Add a
negative test inside the test_force_tool case to exercise the guarded error
branch in to_anthropic_tools: call to_anthropic_tools([TOOL], "does_not_exist")
inside a pytest.raises(ToolTranslationError) context and assert that it raises;
keep the existing valid forced-tool assertion (to_anthropic_tools([TOOL],
"rank_recommendations") -> choice ==
{"type":"tool","name":"rank_recommendations"}) and add the raises case
immediately after to prevent silent regressions.
dashboard/app/(dj)/settings/ai/__tests__/page.test.tsx (1)

68-85: ⚡ Quick win

Align policy-filter test with the DJ auth boundary

This test currently mocks getAdminLlmPolicy() success while the page is running in DJ auth mode. Please add a case that models the real non-admin policy-fetch path so provider filtering behavior is validated under actual DJ constraints.

As per coding guidelines server/app/**/*.py: “Use get_current_admin dependency for all endpoints under /api/admin/*.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard/app/`(dj)/settings/ai/__tests__/page.test.tsx around lines 68 - 85,
The test currently only mocks api.getAdminLlmPolicy for an admin path but the
page runs in DJ auth mode; add a second case that models the non-admin
policy-fetch flow by having api.getAdminLlmPolicy fail (e.g., reject with a 403)
and mock the non-admin endpoint api.getLlmPolicy (or the equivalent non-admin
policy getter used by SettingsAIPage) to return the policy {
llm_apikey_connectors_enabled: false, llm_compatible_connector_enabled: true,
llm_default_connector_id: null }, while keeping api.listLlmConnectors mocked;
then render SettingsAIPage, open the Add provider dropdown and assert the
provider select options equal ['openai_compatible'] so provider filtering is
validated under the DJ (non-admin) auth boundary.
server/app/services/llm/openrouter_models.py (1)

71-75: ⚡ Quick win

Return a copy of cached model lists to avoid accidental cache mutation by callers.

The function currently returns the cache list by reference, so any downstream mutation can poison the process-wide cache.

💡 Proposed fix
@@
         fetched_at, models = _cache
         if _now() - fetched_at < CACHE_TTL_SECONDS:
-            return models
+            return list(models)
@@
         logger.warning("Failed to fetch OpenRouter model catalogue")
         if _cache is not None:
-            return _cache[1]
+            return list(_cache[1])
         return []
@@
     elif _cache is not None:
         # Empty parse but we had a prior good list — keep serving it.
-        return _cache[1]
-    return models
+        return list(_cache[1])
+    return list(models)

Also applies to: 83-93

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/openrouter_models.py` around lines 71 - 75, The
cached model list is returned by reference, allowing callers to mutate
process-wide _cache; update the cache-returning paths in the function that reads
_cache (the block checking "if not force_refresh and _cache is not None" and the
later branch that sets/returns _cache after fetching) to return a shallow copy
of the models (e.g., list(models) or models.copy()) instead of the original list
so callers cannot mutate the cached list stored in _cache; ensure both the early
return and the post-fetch return use the same copying approach.
server/openapi.json (1)

1904-2193: 🏗️ Heavy lift

Encode connector credential requirements in the OpenAPI schema (not just runtime validators).

ConnectorCreate / ConnectorCredentialsRotate in server/openapi.json make all credential fields optional/nullable, so generated TS clients can submit invalid payloads (e.g., an empty rotation object), even though the source Pydantic models enforce these rules at runtime (_require_credentials_for_type and _require_at_least_one). Consider expressing these constraints in the emitted schema (e.g., discriminated oneOf/anyOf variants and/or minProperties) so the OpenAPI contract matches the validation behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/openapi.json` around lines 1904 - 2193, The OpenAPI schema currently
allows all credential fields optional which mismatches runtime validators;
update the ConnectorCreate and ConnectorCredentialsRotate definitions in
server/openapi.json to encode the credential requirements enforced by
_require_credentials_for_type and _require_at_least_one: for ConnectorCreate
replace the loose object with a discriminated oneOf (or anyOf) keyed by
connector_type that lists per-connector required properties (e.g., for
"openai_apikey"/"anthropic_apikey"/"openrouter_apikey"/"xai_apikey"/"gemini_apikey"
require api_key; for "openai_compatible" require base_url; for "bedrock" require
aws_access_key_id/aws_secret_access_key/aws_region/aws_model_id; for
"azure_openai" require
api_key/azure_resource_name/azure_deployment_name/azure_api_version) matching
build_create_payload; for ConnectorCredentialsRotate add minProperties:1 (or an
anyOf that requires at least one credential field) so generated TS clients
cannot send empty rotation objects and so rotate_credentials validation is
reflected in the contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dashboard/app/`(dj)/settings/ai/page.tsx:
- Around line 95-100: The Promise.all handler uses fetchPolicySoft() which
returns null on failure and the caller (in this file) treats null as “allow
all,” letting DJs see restricted connectors; change the client-side handling in
the Promise.all().then(([rows, p]) => { ... }) so that if fetchPolicySoft()
returns null you setPolicy to a conservative fallback (deny/disallow connectors
or an explicit empty/forbidden policy) instead of treating null as permissive;
alternatively make fetchPolicySoft() return an explicit restrictive policy on
error. Also ensure the server-side policy endpoint is protected with the
admin-only dependency (use get_current_admin) so non-admin DJ requests cannot
retrieve admin policy data. Apply changes around functions/api calls:
api.listLlmConnectors(), fetchPolicySoft(), and the setPolicy usage referenced
in this file.

In `@dashboard/app/admin/ai/page.tsx`:
- Around line 19-25: TYPE_LABELS is missing mappings for some supported provider
enum values so admin views show raw enum keys; update the TYPE_LABELS object(s)
to include entries for "xai_apikey" => "xAI", "bedrock" => "AWS Bedrock" and
"azure_openai" => "Azure OpenAI" (and any duplicate TYPE_LABELS declarations in
this file: the declarations around the other referenced ranges) so those labels
render human-readable names in the admin UI.

In `@server/app/api/events.py`:
- Around line 943-947: The LLM availability check uses is_llm_available(db,
actor=user) but the response later is built without actor (so llm_available may
be incorrect for connector-specific setups) and the 503 message is stale; change
the availability check and any later llm_available evaluation to use the same
actor context used for the response (make both calls pass the same actor
variable, not hardcoded user), and update the HTTPException detail string to a
generic message that mentions connector/API key configuration rather than only
ANTHROPIC_API_KEY; reference is_llm_available and the
llm_available/HTTPException usage to locate and fix the code paths.

In `@server/app/api/llm.py`:
- Around line 317-324: Before calling delete_connector and committing, check
whether the system-wide default llm_default_connector_id equals the connector
being deleted and, if so, clear that field (set to None/empty) on the
system/settings record and persist the change; perform this update prior to
delete_connector/db.commit() so the global default isn't left stale (this
mirrors the logic used in the admin revoke path).

In `@server/app/models/llm_connector.py`:
- Line 88: The model's Mapped field "status" defines an ORM default but lacks
the DB server default present in the migration; update the mapped_column call
for status to include a matching server_default (e.g.,
server_default=sqlalchemy.text("'<STATUS>'") using the STATUS_ACTIVE value) so
the model exactly matches the migration (type String(20), nullable=False,
default=STATUS_ACTIVE and server_default present), and add the necessary import
for sqlalchemy.text if not already imported.

In `@server/app/models/system_settings.py`:
- Around line 31-33: The two boolean Mapped columns
llm_apikey_connectors_enabled and llm_compatible_connector_enabled currently
only set ORM defaults; update their mapped_column calls to include matching DB
server defaults (server_default) so the model exactly mirrors the migration—add
a server_default representing true (matching the migration's DB-side default)
for both columns in the SystemSettings model (symbols:
llm_apikey_connectors_enabled, llm_compatible_connector_enabled) and ensure the
column type and nullable settings remain identical to the migration.

In `@server/app/schemas/llm.py`:
- Around line 204-210: The AdminPolicyPatch model allows contradictory payloads
where clear_default=True and llm_default_connector_id is non-null; add
validation on AdminPolicyPatch (e.g., a pydantic `@root_validator` or validator)
that checks if clear_default is True and llm_default_connector_id is not None
and raises a ValueError (or ValidationError) to reject such payloads,
referencing the AdminPolicyPatch class and the clear_default and
llm_default_connector_id fields so clients cannot submit both simultaneously.

In `@server/app/services/llm/adapters/anthropic_apikey.py`:
- Around line 118-122: The list-content branch in anthropic_apikey.py flattens
only objects with a .text attribute, so dict-backed blocks like
{"type":"text","text":"..."} become empty; update the logic where content =
m.content and the list comprehension (the expression joining getattr(b, "text",
"") or "") to handle dicts by checking isinstance(b, dict) and using
b.get("text", "") (falling back to getattr(b, "text", "") when not a dict), so
any element with a 'text' key is included when computing text.
- Around line 81-94: The AsyncAnthropic client created by
self._client(timeout=timeout) in AnthropicApiKeyAdapter.chat() is never closed;
update chat() to use an async context (async with AsyncAnthropic(...) as
client:) or ensure await client.close() in a finally block so the per-request
HTTP resources are released; make the same change pathing through health_check()
which calls chat(), and keep existing exception handling (APITimeoutError,
APIConnectionError, APIStatusError, APIError) and return via
parse_anthropic_response(message).

In `@server/app/services/llm/adapters/bedrock.py`:
- Around line 109-117: The SigV4 signing call in the Bedrock adapter is missing
the temporary session token: update the sign_request(...) invocation in
bedrock.py to forward creds["aws_session_token"] (when present) so temporary STS
credentials are included; specifically add an
aws_session_token=creds.get("aws_session_token") argument to the
sign_request(...) call (function name: sign_request, variable: creds) so
connectors using session credentials authenticate correctly.
- Around line 193-197: The current Bedrock translators iterate m.content and
only read attribute .text, which drops dict-backed blocks; update the logic that
sets the local variable text (the branches handling m.content/list and the
similar block at the other translator) to handle dict items as well—when
iterating content, for each element b use getattr(b, "text", None) first and if
that is None and isinstance(b, dict) then use b.get("text") (or b.get("content")
if that is the canonical key your callers use), concatenating those values so
dict-backed blocks are preserved instead of collapsing to empty strings; apply
the same fix to both occurrences (the m.content handling around the text
assignment and the second occurrence mentioned at the other block).

In `@server/app/services/llm/adapters/openai_compatible.py`:
- Around line 24-57: The extracted base_url from _extract_credentials must be
validated with the URL validator before any outbound HTTP call: after calling
self._extract_credentials() in both chat and health_check (or perform the
validation inside _extract_credentials itself), pass the base_url through the
validator from url_validator.py (using the same policy: HTTPS any host; HTTP
only loopback+RFC1918) and if validation fails raise AuthInvalid (or similar) to
stop the request; only call call_openai_chat when the validator approves the
base_url and keep passing bearer as before.

In `@server/app/services/llm/tool_translation.py`:
- Around line 61-70: The current parsing loop over raw_tool_calls may produce
non-dict inputs (or other exception types) that escape as raw exceptions; update
the parsing logic in the loop that handles raw_args (the block building
input_obj from json.loads or raw_args) to explicitly validate that input_obj is
a dict and, if not, raise ToolTranslationError with a clear message; also catch
JSONDecodeError as already done but ensure any other unexpected types (e.g., if
raw_args is None or a primitive) are converted into ToolTranslationError instead
of letting TypeError/ValueError propagate; apply the same validation and error
normalization to the analogous parsing block around the code referenced at lines
150-163 so both ToolCall creation sites (using variables raw_args, input_obj,
ToolCall(...)) consistently raise ToolTranslationError for malformed/non-dict
tool inputs.

In `@server/app/services/recommendation/llm_client.py`:
- Around line 287-305: The current path always calls Gateway.dispatch when db is
present which skips the legacy Anthropic fallback; modify the logic in the block
that builds ChatRequest and calls Gateway.dispatch (symbols: ChatRequest,
Gateway.dispatch, _parse_tool_response, _legacy_call, db, actor, user_message)
so that after awaiting Gateway.dispatch you detect a failed/no-connector or
non-actionable response (e.g., dispatch returns None, an error flag, or a
response without tool output) and in that case call and await
_legacy_call(user_message) and set result to that value; otherwise keep using
_parse_tool_response(response) as now. Ensure the fallback path preserves
async/await semantics and error propagation.

In `@server/openapi.json`:
- Around line 2304-2335: Update the source Pydantic/FastAPI model that generates
ConnectorPatch so that display_name is optional but not nullable (remove
Optional[...]=None or default=None usage and instead use a sentinel/unset
default), e.g., change the field declaration used to produce ConnectorPatch so
its type is just str with a missing-field default rather than allowing None;
then regenerate the OpenAPI (server/openapi.json) so ConnectorPatch.display_name
no longer includes "type": "null". This aligns the schema with update_metadata()
in server/app/services/llm/connector_storage.py (which treats None as "do not
update") and with ConnectorOut.display_name being always a string. Ensure the
model change propagates to whatever model/class produces ConnectorPatch (the
source model class that defines display_name).

In `@server/tests/test_llm_api.py`:
- Around line 436-444: The test test_create_rejects_unknown_type is using a
supported connector_type ("gemini_apikey"), so change the payload sent to
client.post("/api/llm/connectors") to use a truly unknown connector_type (e.g.,
"unknown_connector" or "not_a_connector") so the request exercises the
server-side unknown-type handling and the assertion resp.status_code in (400,
422) is meaningful.

---

Nitpick comments:
In `@dashboard/app/`(dj)/settings/ai/__tests__/page.test.tsx:
- Around line 68-85: The test currently only mocks api.getAdminLlmPolicy for an
admin path but the page runs in DJ auth mode; add a second case that models the
non-admin policy-fetch flow by having api.getAdminLlmPolicy fail (e.g., reject
with a 403) and mock the non-admin endpoint api.getLlmPolicy (or the equivalent
non-admin policy getter used by SettingsAIPage) to return the policy {
llm_apikey_connectors_enabled: false, llm_compatible_connector_enabled: true,
llm_default_connector_id: null }, while keeping api.listLlmConnectors mocked;
then render SettingsAIPage, open the Add provider dropdown and assert the
provider select options equal ['openai_compatible'] so provider filtering is
validated under the DJ (non-admin) auth boundary.

In `@server/app/services/llm/openrouter_models.py`:
- Around line 71-75: The cached model list is returned by reference, allowing
callers to mutate process-wide _cache; update the cache-returning paths in the
function that reads _cache (the block checking "if not force_refresh and _cache
is not None" and the later branch that sets/returns _cache after fetching) to
return a shallow copy of the models (e.g., list(models) or models.copy())
instead of the original list so callers cannot mutate the cached list stored in
_cache; ensure both the early return and the post-fetch return use the same
copying approach.

In `@server/openapi.json`:
- Around line 1904-2193: The OpenAPI schema currently allows all credential
fields optional which mismatches runtime validators; update the ConnectorCreate
and ConnectorCredentialsRotate definitions in server/openapi.json to encode the
credential requirements enforced by _require_credentials_for_type and
_require_at_least_one: for ConnectorCreate replace the loose object with a
discriminated oneOf (or anyOf) keyed by connector_type that lists per-connector
required properties (e.g., for
"openai_apikey"/"anthropic_apikey"/"openrouter_apikey"/"xai_apikey"/"gemini_apikey"
require api_key; for "openai_compatible" require base_url; for "bedrock" require
aws_access_key_id/aws_secret_access_key/aws_region/aws_model_id; for
"azure_openai" require
api_key/azure_resource_name/azure_deployment_name/azure_api_version) matching
build_create_payload; for ConnectorCredentialsRotate add minProperties:1 (or an
anyOf that requires at least one credential field) so generated TS clients
cannot send empty rotation objects and so rotate_credentials validation is
reflected in the contract.

In `@server/tests/test_llm_gateway.py`:
- Around line 194-231: The test test_falls_back_to_system_default is calling
Gateway.dispatch with None as the actor, so it exercises system-default fallback
rather than the admin-actor fallback described; update the call to
Gateway.dispatch to pass admin_user_actor (or split into two tests: one that
calls Gateway.dispatch(db, admin_user_actor, gateway_request, ...) to verify
admin-specific connector behavior and another that calls Gateway.dispatch(db,
None, gateway_request, ...) to verify system-default fallback) so the
admin-actor path is actually exercised.

In `@server/tests/test_llm_tool_translation.py`:
- Around line 66-69: Add a negative test inside the test_force_tool case to
exercise the guarded error branch in to_anthropic_tools: call
to_anthropic_tools([TOOL], "does_not_exist") inside a
pytest.raises(ToolTranslationError) context and assert that it raises; keep the
existing valid forced-tool assertion (to_anthropic_tools([TOOL],
"rank_recommendations") -> choice ==
{"type":"tool","name":"rank_recommendations"}) and add the raises case
immediately after to prevent silent regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f51edd89-6135-47f9-98a6-0da1165ba44f

📥 Commits

Reviewing files that changed from the base of the PR and between e095380 and 4d36057.

📒 Files selected for processing (56)
  • .env.example
  • CLAUDE.md
  • dashboard/app/(dj)/settings/ai/__tests__/page.test.tsx
  • dashboard/app/(dj)/settings/ai/page.tsx
  • dashboard/app/admin/ai/__tests__/page.test.tsx
  • dashboard/app/admin/ai/page.tsx
  • dashboard/lib/__tests__/api.test.ts
  • dashboard/lib/api-types.generated.ts
  • dashboard/lib/api-types.ts
  • dashboard/lib/api.ts
  • docs/superpowers/specs/2026-05-24-admin-ai-oauth-design.md
  • server/alembic/versions/046_admin_ai_oauth.py
  • server/app/api/__init__.py
  • server/app/api/admin_llm.py
  • server/app/api/events.py
  • server/app/api/llm.py
  • server/app/models/__init__.py
  • server/app/models/llm_connector.py
  • server/app/models/system_settings.py
  • server/app/schemas/llm.py
  • server/app/services/llm/__init__.py
  • server/app/services/llm/adapters/__init__.py
  • server/app/services/llm/adapters/_httpx_openai.py
  • server/app/services/llm/adapters/anthropic_apikey.py
  • server/app/services/llm/adapters/azure_openai.py
  • server/app/services/llm/adapters/bedrock.py
  • server/app/services/llm/adapters/gemini_apikey.py
  • server/app/services/llm/adapters/openai_apikey.py
  • server/app/services/llm/adapters/openai_compatible.py
  • server/app/services/llm/adapters/openrouter_apikey.py
  • server/app/services/llm/adapters/xai_apikey.py
  • server/app/services/llm/base.py
  • server/app/services/llm/connector_storage.py
  • server/app/services/llm/exceptions.py
  • server/app/services/llm/gateway.py
  • server/app/services/llm/openrouter_models.py
  • server/app/services/llm/registry.py
  • server/app/services/llm/sigv4.py
  • server/app/services/llm/tool_translation.py
  • server/app/services/llm/url_validator.py
  • server/app/services/recommendation/llm_client.py
  • server/app/services/recommendation/llm_hooks.py
  • server/app/services/recommendation/service.py
  • server/app/services/system_settings.py
  • server/openapi.json
  • server/scripts/export_openapi.py
  • server/tests/test_llm_adapters.py
  • server/tests/test_llm_api.py
  • server/tests/test_llm_bedrock_adapter.py
  • server/tests/test_llm_gateway.py
  • server/tests/test_llm_hooks.py
  • server/tests/test_llm_openrouter_models.py
  • server/tests/test_llm_recommendation_via_gateway.py
  • server/tests/test_llm_sigv4.py
  • server/tests/test_llm_tool_translation.py
  • server/tests/test_llm_url_validator.py

Comment thread dashboard/components/AiProvidersSection.tsx Outdated
Comment thread dashboard/app/admin/ai/page.tsx
Comment thread server/app/api/events.py
Comment thread server/app/api/llm.py
Comment thread server/app/models/llm_connector.py Outdated
Comment thread server/app/services/llm/adapters/openai_compatible.py
Comment thread server/app/services/llm/tool_translation.py
Comment thread server/app/services/recommendation/llm_client.py Outdated
Comment thread server/openapi.json
Comment thread server/tests/test_llm_api.py
…endation model

OpenAI Platform GPT-5/o-series models reject the legacy `max_tokens` field
(HTTP 400 unsupported_parameter) and require `max_completion_tokens`; Azure
OpenAI serves the same models. Thread a `max_tokens_field` through the shared
OpenAI caller so the Platform + Azure adapters opt into `max_completion_tokens`,
while third-party compatible / xAI / OpenRouter endpoints keep legacy
`max_tokens`.

Drop the 1-token cap from the shared connector healthcheck: reasoning models
spend their completion budget on internal reasoning tokens before producing any
visible output, so a 1-token budget yields zero output and an HTTP 400.

Propagate the gateway's actual ChatResponse.model through LLMSuggestionResult →
LLMRecommendationResult → the /recommendations/llm response, replacing the
hardcoded anthropic_model default. The recommendation badge now reflects the
connector that actually ran (e.g. gpt-5.4-mini) instead of always "Haiku".

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/app/services/recommendation/llm_hooks.py (1)

43-46: 💤 Low value

Consider adding type hints for db and actor parameters.

The parameters lack type annotations, unlike the equivalent signature in llm_client.py:call_llm which has db: Session | None = None, actor: User | None = None. Adding them improves IDE support and consistency.

     *,
-    db=None,
-    actor=None,
+    db: "Session | None" = None,
+    actor: "User | None" = None,

(Use string literals or add imports at top of file.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/recommendation/llm_hooks.py` around lines 43 - 46, Add
explicit type annotations for the db and actor parameters in the function
signature in llm_hooks.py to match the equivalent in llm_client.py: use db:
Session | None = None and actor: User | None = None (either import Session and
User at the top of llm_hooks.py or use string-literal forward refs like
"Session" and "User") so IDEs and type checkers get the same hints as call_llm;
update the function signature where db and actor appear to include these types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/app/services/recommendation/llm_hooks.py`:
- Around line 43-46: Add explicit type annotations for the db and actor
parameters in the function signature in llm_hooks.py to match the equivalent in
llm_client.py: use db: Session | None = None and actor: User | None = None
(either import Session and User at the top of llm_hooks.py or use string-literal
forward refs like "Session" and "User") so IDEs and type checkers get the same
hints as call_llm; update the function signature where db and actor appear to
include these types.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 858bc04f-dd67-42ff-9cc8-b7d4815e735b

📥 Commits

Reviewing files that changed from the base of the PR and between 4d36057 and 3baf828.

📒 Files selected for processing (10)
  • dashboard/app/(dj)/events/[code]/components/RecommendationsCard.tsx
  • server/app/api/events.py
  • server/app/services/llm/adapters/_httpx_openai.py
  • server/app/services/llm/adapters/azure_openai.py
  • server/app/services/llm/adapters/openai_apikey.py
  • server/app/services/recommendation/llm_client.py
  • server/app/services/recommendation/llm_hooks.py
  • server/app/services/recommendation/service.py
  • server/tests/test_llm_adapters.py
  • server/tests/test_llm_recommendation_via_gateway.py

…session token, admin labels, policy validation, SSRF revalidation, server defaults)

- admin AI page: add xAI/Bedrock/Azure OpenAI to TYPE_LABELS
- events.py: refresh 503 LLM message for connector gateway; pass actor=user
- llm.py: DJ delete clears system default connector (parity with admin revoke)
- models: server_default on connector.status and settings LLM policy bools
- schemas: AdminPolicyPatch rejects clear_default + non-null default id
- anthropic adapter: close client via async with; handle dict text blocks
- bedrock adapter: forward aws_session_token to SigV4; handle dict text blocks
- openai_compatible: re-validate base_url at call time (SSRF boundary)
- tool_translation: malformed OpenAI tool calls raise ToolTranslationError
- tests: cover delete-clears-default, policy validator, unsupported type

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dashboard/app/admin/ai/page.tsx (1)

19-28: ⚡ Quick win

Type this map from the shared connector union.

Record<string, string> lets future backend connector additions compile while this label table silently falls out of date again. Key it from LlmAdminConnector['connector_type'] so missing labels fail at type-check time.

♻️ Proposed fix
-const TYPE_LABELS: Record<string, string> = {
+const TYPE_LABELS: Record<LlmAdminConnector['connector_type'], string> = {

Based on learnings Pydantic models should mirror backend response schemas with TypeScript types in frontend.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard/app/admin/ai/page.tsx` around lines 19 - 28, The TYPE_LABELS map is
currently typed as Record<string, string>, which allows missing labels to pass;
change its type to use the shared connector union so missing labels cause a type
error—replace Record<string, string> with a type keyed by
LlmAdminConnector['connector_type'] (or an equivalent mapped union) and ensure
TYPE_LABELS includes an entry for every member of
LlmAdminConnector['connector_type'] so the compiler enforces label coverage;
update usages if needed to satisfy the stricter type.
server/app/services/llm/adapters/bedrock.py (1)

109-119: ⚡ Quick win

Add regression coverage for the Bedrock-specific fixes.

These paths were already fragile enough to need follow-up fixes, but there still isn't Bedrock adapter coverage here for STS session credentials or dict-backed content blocks. A small adapter test that asserts aws_session_token reaches SigV4 signing and that mixed block shapes flatten correctly would keep both regressions from slipping back in.

Based on learnings: For each bug fix, pin a regression test that references the commit SHA to prevent the bug from recurring.

Also applies to: 190-258

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/adapters/bedrock.py` around lines 109 - 119, Add
regression tests for the Bedrock adapter to prevent the STS session-token and
mixed/block-shape regressions from returning: write a test that calls the
Bedrock request-building/send flow (the code path that calls sign_request with
utcnow()) and asserts that creds["aws_session_token"] is forwarded into
sign_request (i.e., X-Amz-Security-Token is included in the SigV4-signed
headers) and a separate test that exercises dict-backed/mixed content blocks
through the adapter's payload flattening to ensure the shape is normalized the
same way as in the fix; make the tests reference the commit SHA that introduced
the fixes and target the Bedrock adapter functions that build payloads and call
sign_request so future changes will fail if the token forwarding or block
flattening regresses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/app/services/llm/adapters/openai_compatible.py`:
- Around line 34-45: The code currently falls back to
self.connector.base_url_plain when blob lacks "base_url", which exposes the
plaintext admin mirror as an active source of truth; change the logic in the
routine that reads the blob so it requires the encrypted blob to contain
"base_url" (do not use self.connector.base_url_plain as a fallback), and if
missing raise AuthInvalid("Connector is missing a base_url"); keep the existing
validate_compatible_base_url(str(base_url)) call and the InvalidBaseUrlError ->
AuthInvalid conversion logic unchanged so validation still occurs at call time.

In `@server/app/services/llm/tool_translation.py`:
- Around line 71-76: The current parsing in tool_translation.py wrongly coerces
falsy non-object inputs (e.g., [] or False) into {} which lets invalid tool
arguments pass; in the block handling raw_args (inside the try/except that
raises ToolTranslationError) change the logic so only None becomes an empty dict
and all other non-string values are left as-is (e.g., use “if raw_args is None
then {} else raw_args” or explicit branches), then validate that the resulting
input_obj is a dict and raise ToolTranslationError for anything else; update the
code paths around json.loads(raw_args) and the subsequent isinstance check to
reflect this behavior (refer to the try/except, json.loads usage and the
ToolTranslationError raises).

---

Nitpick comments:
In `@dashboard/app/admin/ai/page.tsx`:
- Around line 19-28: The TYPE_LABELS map is currently typed as Record<string,
string>, which allows missing labels to pass; change its type to use the shared
connector union so missing labels cause a type error—replace Record<string,
string> with a type keyed by LlmAdminConnector['connector_type'] (or an
equivalent mapped union) and ensure TYPE_LABELS includes an entry for every
member of LlmAdminConnector['connector_type'] so the compiler enforces label
coverage; update usages if needed to satisfy the stricter type.

In `@server/app/services/llm/adapters/bedrock.py`:
- Around line 109-119: Add regression tests for the Bedrock adapter to prevent
the STS session-token and mixed/block-shape regressions from returning: write a
test that calls the Bedrock request-building/send flow (the code path that calls
sign_request with utcnow()) and asserts that creds["aws_session_token"] is
forwarded into sign_request (i.e., X-Amz-Security-Token is included in the
SigV4-signed headers) and a separate test that exercises dict-backed/mixed
content blocks through the adapter's payload flattening to ensure the shape is
normalized the same way as in the fix; make the tests reference the commit SHA
that introduced the fixes and target the Bedrock adapter functions that build
payloads and call sign_request so future changes will fail if the token
forwarding or block flattening regresses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4e5cd8e8-c20a-4c64-aa74-555a03852671

📥 Commits

Reviewing files that changed from the base of the PR and between 3baf828 and 27a9dd7.

📒 Files selected for processing (12)
  • dashboard/app/admin/ai/page.tsx
  • server/app/api/events.py
  • server/app/api/llm.py
  • server/app/models/llm_connector.py
  • server/app/models/system_settings.py
  • server/app/schemas/llm.py
  • server/app/services/llm/adapters/anthropic_apikey.py
  • server/app/services/llm/adapters/bedrock.py
  • server/app/services/llm/adapters/openai_compatible.py
  • server/app/services/llm/tool_translation.py
  • server/tests/test_llm_adapters.py
  • server/tests/test_llm_api.py

Comment thread server/app/services/llm/adapters/openai_compatible.py Outdated
Comment thread server/app/services/llm/tool_translation.py
thewrz added 3 commits May 25, 2026 22:22
* feat(ai): add admin LLM audit-trail browse + CSV export endpoints (#341)

Add read-only admin endpoints over the existing llm_audit_event table:
- GET /api/admin/llm/audit: paginated, filterable JSON browse (event_type,
  actor_user_id, target_connector_id, days) with joined actor username and
  target connector display name.
- GET /api/admin/llm/audit.csv: streaming CSV export honoring the same
  filters, capped at 10k rows.

Both join the connector display name only — credential material is never
read or returned. Adds frontend API client methods + types (no migration).

* feat(ai): add Audit trail card to admin AI page (#341)

Add a filterable, paginated audit-trail section to /admin/ai that browses
llm_audit_event via the new admin endpoints. Filters: event type, actor
user ID, target connector, date range. Columns: timestamp, actor, event
type, connector display name, notes. Includes CSV export honoring active
filters and prev/next pagination.

* fix(ai): address CodeRabbit review on PR #360 (CSV formula injection, text/csv contract, test coverage)

- Neutralize spreadsheet-formula injection in audit CSV export by prefixing
  cells starting with =, +, -, @, tab, or CR with a single quote
- Declare text/csv response media type on GET /audit.csv so the OpenAPI
  contract + regenerated TS types model it as a CSV download (was application/json)
- Add unauthenticated 401 guard test + formula-injection regression test for
  the CSV endpoint; assert CSV export honors the active event-type filter

* refactor(ai): dedupe CSV formula-injection guard into core/csv_safe

The audit CSV export hand-rolled a private _sanitize_csv_cell that duplicated
services/export.py:sanitize_csv_value (same defang, less robust — no None
handling). Promote the shared primitive to core/csv_safe.py (alongside the
other cross-cutting primitives) and point both the audit export and the
event exports at it. stdlib csv.writer still handles serialization; only the
formula-injection guard is app-layer. No behavior change.

* fix(ai): escape leading LF in CSV sanitizer to close formula-injection bypass

sanitize_csv_value guarded = + - @ tab CR but not LF. A leading newline can be
stripped by spreadsheet importers, exposing a trailing formula. Add "\n" to the
prefix set (now covers both the audit and event CSV exports) + regression test.
Add ChatRequest.fallback_policy ("none" | "org_default" |
"retry_then_org_default") so a DJ's connector that fails mid-event with a
transient/credential error can optionally fall back to the org-default
connector instead of failing the call.

- Gateway extracts a single-attempt helper (_attempt) and a bounded
  fallback chain: at most one same-connector retry (retry_then_org_default)
  plus one org-default attempt. Never loops.
- Fallback-eligible triggers: RateLimited, AuthInvalid, ProviderUnavailable,
  QuotaExceeded. ToolTranslationError / generic LlmError are NOT eligible.
- Writes a fallback_triggered:<trigger> LlmAuditEvent (reusing the existing
  audit_event helper) targeting the fallback connector id. Trigger encoded in
  the event_type string to fit the existing String(60) column — no migration.
- Default "none" preserves the current fail-fast behavior; callers opt in
  per-call.
* feat(ai): DJ-readable LLM connector policy endpoint (#355)

Add GET /api/llm/policy (get_current_active_user) exposing the
non-sensitive connector-policy subset DJs need: the two enablement
toggles plus a server-computed allowed_connector_types set. Admin-only
fields (llm_default_connector_id) are intentionally excluded.

The /settings/ai page now reads this DJ endpoint instead of the
admin-only /api/admin/llm/policy (which 403s for non-admin DJs). It
fails *closed*: when the policy can't be read it offers no providers,
rather than the old behavior of treating a 403 as "all types allowed"
and surfacing providers the admin disabled (which the create call would
reject with a 403).

allowed_connector_types is computed server-side so the picker exactly
matches what the create endpoint accepts, eliminating UX/enforcement
drift. No schema/migration — reads existing SystemSettings flags.

* fix(ai): document 401/403 on DJ policy route, harden DJ-scoped policy test

Add explicit 401/403 responses to GET /api/llm/policy so the regenerated
OpenAPI contract and TS client cover the unauthenticated and pending-DJ
failure modes the route already enforces via get_current_active_user.
Regenerate openapi.json + api-types.generated.ts.

Harden the "reads the DJ-scoped policy endpoint" test to assert the admin
policy endpoint is never called, fully enforcing the regression guard.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dashboard/app/`(dj)/settings/ai/page.tsx:
- Around line 128-134: The current useMemo treats policy === null as “disabled”
even while the policy is still loading; change the guard to only return an empty
list when policy === null AND loading is false (i.e. if (policy === null &&
!loading) return []), otherwise allow the memo to defer to the actual policy
value (or a loading/fallback) so the UI doesn't show “disabled by admin” during
initial load; apply the same change to the other occurrence referencing policy
(lines ~286-290) so both spots gate on the loading flag.

In `@dashboard/app/admin/ai/__tests__/page.test.tsx`:
- Around line 258-283: The test currently stubs the global URL with
vi.stubGlobal(...) and calls vi.unstubAllGlobals() at the end, which can leak if
an assertion fails; update the test that renders AdminAISettingsPage so that the
vi.stubGlobal('URL', { ...URL, createObjectURL, revokeObjectURL }) is paired
with a try/finally block that calls vi.unstubAllGlobals() in the finally clause
to guarantee restoration even on test failure, keeping the rest of the test
logic (render, waitFor assertions, fireEvent calls, auditSpy/exportSpy checks)
unchanged.

In `@dashboard/app/admin/ai/page.tsx`:
- Around line 33-41: The AUDIT_EVENT_TYPES list is missing gateway-emitted
fallback_triggered events; update the array (variable AUDIT_EVENT_TYPES) to
include a fallback entry such as { value: 'fallback_triggered', label: 'Fallback
triggered (all)' } (and add any specific fallback flavors if you expect typed
variants, e.g. 'fallback_triggered:connector') so the admin dropdown can filter
those audit entries; make the same addition to the duplicate list used later in
the file (the other AUDIT_EVENT_TYPES occurrence around the bottom) so both
dropdowns include fallback-triggered options.

In `@server/app/core/csv_safe.py`:
- Around line 9-24: sanitize_csv_value currently only checks the very first
character and fails to handle values with leading normal spaces (e.g. "
=HYPERLINK(...)"), and also misses a final return for non-formula values; update
sanitize_csv_value (and keep _FORMULA_PREFIXES) to locate the first non-space
character in value (e.g. find the index of the first char not equal to ' '),
check that character against _FORMULA_PREFIXES, and if it matches return "'" +
value, otherwise return the original value (still return "" for falsy input).

In `@server/tests/test_export.py`:
- Around line 323-327: Add a regression case to the CSV sanitization tests to
cover formulas that start with a leading space before a dangerous payload;
update the test_sanitizes_line_feed (or add a new test) to call
sanitize_csv_value with the string " =cmd|' /C calc'!A0" and assert the returned
value begins with a single-quote (e.g., result.startswith("'")), ensuring
sanitize_csv_value handles the space-prefix bypass path the same as a leading
newline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3198dcc6-8dd9-47d1-8f2c-551adef1adc0

📥 Commits

Reviewing files that changed from the base of the PR and between 27a9dd7 and fd7e650.

📒 Files selected for processing (20)
  • dashboard/app/(dj)/settings/ai/__tests__/page.test.tsx
  • dashboard/app/(dj)/settings/ai/page.tsx
  • dashboard/app/admin/ai/__tests__/page.test.tsx
  • dashboard/app/admin/ai/page.tsx
  • dashboard/lib/api-types.generated.ts
  • dashboard/lib/api-types.ts
  • dashboard/lib/api.ts
  • docs/superpowers/plans/2026-05-25-llm-audit-trail-admin-ui.md
  • server/app/api/admin_llm.py
  • server/app/api/llm.py
  • server/app/core/csv_safe.py
  • server/app/schemas/llm.py
  • server/app/services/export.py
  • server/app/services/llm/base.py
  • server/app/services/llm/gateway.py
  • server/openapi.json
  • server/tests/test_export.py
  • server/tests/test_llm_admin_audit.py
  • server/tests/test_llm_api.py
  • server/tests/test_llm_gateway.py
✅ Files skipped from review due to trivial changes (2)
  • docs/superpowers/plans/2026-05-25-llm-audit-trail-admin-ui.md
  • dashboard/lib/api-types.generated.ts

Comment thread dashboard/components/AiProvidersSection.tsx
Comment thread dashboard/app/admin/ai/__tests__/page.test.tsx Outdated
Comment thread dashboard/app/admin/ai/page.tsx
Comment thread server/app/core/csv_safe.py Outdated
Comment thread server/tests/test_export.py
thewrz and others added 8 commits May 25, 2026 22:39
…review)

- openai_compatible adapter: drop the base_url_plain fallback. The encrypted
  blob is the sole routing source of truth (storage always writes base_url into
  it on create/rotate); base_url_plain is a display-only admin mirror that could
  be stale. Fail closed if the blob lacks base_url.
- tool_translation: stop coercing falsy non-objects ([], false, 0) into {} —
  only None (and empty-string, sent by some compatible servers for no-arg calls)
  falls back to {}; everything else flows to the existing dict check and raises
  instead of silently running a tool with empty input.
- Regression tests for both.
…back filter, UX/test hygiene)

- csv_safe: add leading space to formula-prefix set (" =FORMULA" bypasses after
  importer trims), + regression test.
- admin/ai audit filter: add the four fallback_triggered:<trigger> event types the
  gateway emits (#338) so admins can filter auto-fallback incidents — the audit
  query is an exact event_type match, so each variant needs its own option.
- settings/ai: gate the "disabled by admin policy" message on !loading to avoid a
  false disabled flash during initial policy load.
- admin/ai test: wrap the URL global stub in try/finally so it can't leak into
  later tests if an assertion fails early.
Relocate the DJ-facing AI connector UI (connect/test/delete, model hint,
Hermes onboarding) from the standalone /settings/ai route into the existing
/account page as an inline "AI / Model providers" card section.

- Extract the connector UI into a reusable AiProvidersSection component;
  the parent page owns auth gating so the component does none of its own.
- Add a permanent (308) redirect /settings/ai -> /account in next.config.js
  so existing bookmarks/links don't 404.
- Delete the old /settings/ai route + test; port the connector tests to the
  component and assert the section renders inside /account.
- Fail-closed policy behavior is preserved verbatim: when the DJ-scoped
  policy endpoint can't be read, no provider types are offered.
- Admin /admin/ai UI is untouched.
…362)

* chore(ai): remove deprecated ANTHROPIC_API_KEY env-var reads (#343)

The LLM Gateway connector system has been the source of truth for
credentials since the MVP, and every production caller of the
recommendation engine passes db + actor (so the gateway path always
runs). The legacy direct-Anthropic env-var fallback was therefore dead
code.

Removed:
- llm_client._legacy_call() + the `db is None` branch + the AsyncAnthropic
  import; call_llm now requires a db session.
- _resolve_max_tokens()'s settings read (inlined DEFAULT_MAX_TOKENS = 1024).
- is_llm_available()'s final ANTHROPIC_API_KEY env-var fallback; without db
  it now returns False (connector-backed only).
- config.anthropic_max_tokens / anthropic_timeout_seconds (only the deleted
  fallback used them).
- Deprecated ANTHROPIC_* lines from .env.example; refreshed CLAUDE.md.

Kept (documented in PR): config.anthropic_api_key + anthropic_model — still
read by the admin AI-settings/model-listing endpoints (live observability +
API contract) and the recommendation response model-name default. The
historical data migration 046_admin_ai_oauth (which reads ANTHROPIC_API_KEY
once to seed a connector) is left intact for replay.

Tests updated to drive the gateway path; full backend suite green (2302
passed, 87% coverage).

* fix(ai): harden LLM tool-payload parsing against malformed provider output (#362 review)

Defensively validate the forced-tool `queries` payload in _parse_tool_response
before constructing LLMSuggestionQuery objects. Custom OpenAI-compatible
endpoints (Ollama, vLLM) may not enforce the tool JSON schema, so one malformed
item should not crash the whole recommendation flow.

- Extract shared _parse_query_items() used by both the canonical ChatResponse
  path and the legacy Anthropic Message-like path.
- Skip non-dict payloads/items, missing or blank/non-string search_query, and
  coerce non-string reasoning to "".
- Add regression tests for malformed payloads (CodeRabbit out-of-diff finding).
* feat(ai): configurable llm_call_log retention (#342)

Make the previously-hardcoded 30-day llm_call_log retention configurable by
admins (7-365 days) and wire up the daily cleanup job that the model docstring
referenced but which did not yet exist.

Backend:
- Add SystemSettings.llm_call_log_retention_days (int, default 30, NOT NULL)
  with matching alembic migration 047 (server_default 30).
- Surface it on the LLM policy endpoint (/api/admin/llm/policy). GET returns
  the value; PATCH accepts it with Pydantic bounds (ge=7, le=365) so
  out-of-range values are rejected at the API level (422).
- Add purge_call_log_older_than() helper to connector_storage.
- Add a daily (24h) cleanup loop in the FastAPI lifespan that reads the
  retention window from settings on each run, so an admin change takes effect
  within 24h without a restart (no hardcoded constant).

Frontend:
- Add a "Call log retention (days)" number input to the Connector policy card
  on /admin/ai; clamps and patches on blur. Regenerated OpenAPI TS types.

Tests:
- API bounds (min/max/boundaries) + persistence; purge helper window behavior;
  cleanup-reads-settings (honors admin change, keeps in-window rows); frontend
  retention input persist + clamp.

* fix(ai): address PR #363 review (retention bounds guard, blur fallback)

- Guard purge_call_log_older_than against out-of-bounds retention_days
  (fail closed on values outside 7-365 so a corrupt/tampered persisted
  setting cannot push the cutoff to now/future and wipe call-log history;
  the daily cleanup loop catches the ValueError and retries next pass).
- Admin /admin/ai retention input: on invalid blur (NaN), fall back to the
  current policy value instead of a hardcoded 30, so clearing the field no
  longer silently overwrites a previously saved window (e.g. 90 -> 30).
- Add regression tests: backend bounds-guard rejection (no deletion) and a
  frontend below-min clamp test locking the 7-365 contract symmetrically.
OSV advisory MAL-2026-4750 flags fastapi 0.136.3 for an undocumented
'fastar' dependency. It is a false positive (withdrawn by OSV 2026-05-26):
'fastar' is a legitimate Rust-tar-bindings package predating the release,
and it is only pulled via fastapi's [standard] extra, which WrzDJ does not
install (we use plain fastapi + uvicorn[standard]) - so it never enters our
dependency tree. We stay on 0.136.3 to keep its underscore-header rejection
(PR #15589) and SSE field validation (PR #15588). Remove this ignore once
pip-audit's feed drops the withdrawn entry.
test_tampered_signature_returns_false flipped the last base64url char of
the signature. base64url's final char encodes only the high bits of the
last decoded byte; the low bits are discarded on decode and _b64decode()
re-pads, so the mutated char frequently round-trips to the SAME sig_bytes.
When it did, hmac.compare_digest passed, the cookie validated, and the
`verified is False` assertion failed (~2/5 runs).

Tamper at the decoded-bytes level instead: decode the signature with the
module's own _b64decode, flip a whole byte (sig[0] ^= 0xFF), re-encode with
_b64encode. This guarantees sig_bytes differs from the expected signature
on every run. Production human_verification.py logic is unchanged.

Closes #364
…#366)

* fix(sse): don't pin a pooled DB connection for the SSE stream lifetime (#356)

event_stream declared `db: Session = Depends(get_db)`, so FastAPI kept that
session (and its checked-out QueuePool connection) open until the request
finished. An EventSource request never finishes while the browser holds it
open, so the connection sat idle but pinned for the whole stream. ~15
concurrent guest viewers exhausted the pool (size 5 + overflow 10), after
which every request 500'd with QueuePool TimeoutError — a low-effort DoS on
an unauthenticated guest-facing endpoint.

Fix: run the one-shot existence/auth check inside a short-lived
`with SessionLocal() as db:` block that returns its connection to the pool
BEFORE the EventSourceResponse is returned. The async generator holds no DB
session (and documents that any future per-tick DB access must open its own
short-lived session). Existence/auth behavior (404 unknown, 410
archived/expired) is preserved exactly.

Adds tests/test_sse_pool.py driving a real QueuePool engine and asserting
25 concurrent idle streams hold 0 checked-out connections.

* test(sse): fix _make_request helper to model live idle stream (CodeRabbit PR #366)

The helper's nested receive() previously returned {"type": "http.disconnect"}
immediately, which made StarletteRequest.is_disconnected() true on the first
generator iteration. The SSE generator exited before it could ever await
queue.get(), so test_n_concurrent_idle_streams_hold_zero_pool_connections was
asserting that instantly-disconnected streams hold zero pool connections —
trivially true, useless as a regression for issue #356.

Fix receive() to suspend forever on a never-set asyncio.Event, matching a real
live, idle SSE client. The handler now suspends on queue.get() as intended, and
checked_out == 0 is a meaningful assertion across 25 concurrent idle streams.

Also:
- Add sanity assertion that primer tasks remain not-done (per CodeRabbit's
  suggested follow-up) so the test fails loudly if receive() ever regresses
  back to instant-disconnect semantics.
- Await primer-task cancellation before aclose() to avoid the
  "asynchronous generator is already running" RuntimeError that occurs when
  cancelling a generator suspended on queue.get().

Verification:
- 10/10 loop runs pass with the new helper.
- Reverted server/app/api/sse.py to the pre-fix shape (generator holds a
  pooled connection); test failed with "Expected 0 pooled connections held
  by 25 idle SSE streams, got 15" — confirming the helper now actually
  exercises the pinned-connection path and catches the original regression.
thewrz added 2 commits May 28, 2026 15:14
…oader (#370)

* feat(ai): LLM adapter plug-in SDK with docs, skeleton, and optional loader

Defines the public extension surface for third-party LLM provider adapters
so forks and community contributors can add new providers without modifying
any file under server/app/services/llm/.

Deliverables:
- docs/LLM-PLUGIN.md — full contract: LlmAdapter ABC, ChatRequest/Response
  fields, exception hierarchy (when to raise AuthInvalid / RateLimited /
  QuotaExceeded / ProviderUnavailable / ToolTranslationError), stable-vs-
  internal API table, tool-translation delegation, registration mechanics,
  test matrix link, 5-minute quick-start.
- docs/examples/echo_adapter.py — minimal LlmAdapter skeleton that registers
  via the public registry and round-trips through chat() with no network
  call. Exercised by the contract test so it can't silently rot.
- server/app/services/llm/plugin_loader.py — optional LLM_PLUGIN_DIR env-var
  loader. Imports every *.py in the dir (non-recursive, skips _-prefixed
  files), fails open on broken plug-ins with a logged stack trace, never
  mutates sys.path. Wired into registry._bootstrap so built-ins always load
  first (a plug-in cannot shadow a built-in — registry refuses
  double-registration).
- server/tests/test_llm_adapter_contract.py — parametrised contract test
  over every registered adapter (8 built-ins + the docs skeleton). Covers
  subclassing, connector_type non-empty + matches registration key,
  async chat/health_check, constructor accepts a connector row, and the
  malformed-credentials boundary that prevents JSONDecodeError leakage.
- server/tests/test_llm_plugin_loader.py — 6 tests pinning loader behaviour
  (no env var, missing dir, file-selection rules, broken-plug-in resilience,
  sys.modules cleanup, sys.path isolation).

Security posture documented in docs/LLM-PLUGIN.md: LLM_PLUGIN_DIR loading
grants the plug-in the full backend process — same trust boundary as
`pip install`. Production deployments are encouraged to keep the env-var
unset and ship trusted plug-ins as ordinary Python modules.

* chore(ai): address CodeRabbit nitpicks on #370

- Remove tautological isinstance(exc_info.value, AuthInvalid | LlmError)
  assertion in test_adapter_raises_typed_error_on_malformed_credentials.
  pytest.raises(LlmError) already guarantees the exception type; the union
  reduces to LlmError since AuthInvalid is a subclass. The docstring already
  documents the intent (accept any LlmError subclass).
- Add 'text' language specifier to the ASCII architecture diagram fence in
  docs/LLM-PLUGIN.md for Markdown-linter compliance.
* feat(ai): per-DJ explicit default LLM connector (#336)

Adds a per-DJ "pinned default" so DJs can force routing through a specific
connector instead of relying on the gateway's MRU heuristic.

- New column LlmConnector.is_default (bool, default false, NOT NULL)
- Partial unique index per user_id where is_default = true (Postgres / SQLite)
- Service helpers set_default_for_user / unset_default_for_user with
  atomic sibling-clear semantics
- Gateway resolution: prefer pinned default; fall through to MRU when no
  default is set or the pinned default is inactive
- API: POST /api/llm/connectors/{id}/default sets, DELETE clears;
  audit events connector_default_set / connector_default_unset
- Setting an inactive connector as default returns 400 (footgun guard)
- Alembic migration 048 backfills each DJ's MRU active connector as default
- Frontend: radio + "Default" badge on each card; optimistic update

* docs(ai): document 400/404 responses on connector default endpoints

Address CodeRabbit threads on PR #371: the POST /api/llm/connectors/{id}/default
endpoint can return 400 (inactive connector) and 404 (non-owned connector), and
the DELETE counterpart can return 404. Only 200/422 were declared in the FastAPI
route metadata, which left the generated OpenAPI schema and dashboard
api-types.generated.ts out of sync with runtime behavior and forced frontend
error-handling to fall back to generic catches.

Add `responses=` entries to both routes, regenerate server/openapi.json, and
re-run openapi-typescript so dashboard/lib/api-types.generated.ts reflects the
new error contracts.
thewrz and others added 10 commits May 28, 2026 15:26
…in (#340, #346) (#372)

Adds a background task that periodically runs `health_check` on every active
LLM connector to catch expired/revoked credentials before a DJ hits one
mid-event, and surfaces the last result in the admin per-DJ connectors table.

- New columns: `llm_connectors.last_health_check_at` (timestamp) +
  `last_health_check_status` (one of: ok / auth_invalid / rate_limited /
  quota_exceeded / provider_unavailable / error).
- Shared `services/llm/health_check.py` helper used by both the DJ-triggered
  test button (`POST /api/llm/connectors/{id}/test`) AND the background monitor
  — guarantees identical observability semantics regardless of trigger source.
- Background loop wakes every 5 minutes, picks active connectors whose last
  check is older than `LLM_HEALTH_CHECK_INTERVAL_HOURS` (default 6, clamped
  to 1..168h) times a per-connector deterministic jitter factor (0.7..1.3).
- Sequential per-pass execution with a 1s sleep between calls — respects
  per-provider rate limits even when many DJs share one upstream account.
- On an active->auth_invalid transition: flips status, writes a
  `connector_health_check_failed` audit row (distinct from per-check noise),
  and attempts an email notification to the DJ via Resend; falls back to
  logging when the DJ has no email or Resend isn't configured.
- Admin per-DJ connectors table gains two sortable columns
  (Last health check / Result) with a colour-coded status badge; default
  sort is `last_health_check_at DESC` for fastest triage.

Both issues add the same two columns to `llm_connectors`. Splitting them
across two PRs would mean two clashing Alembic revisions (047->048 from each
PR with conflicting downgrade ops). One PR ships the schema + monitor + admin
UI atomically so there's no half-state on the epic branch.

Closes #340
Closes #346
The SSE endpoint (`app.api.sse`) opens its own `with SessionLocal()`
block for the existence check, bypassing `get_db`. Tests that exercise
the endpoint hit the real `DATABASE_URL` (Postgres in CI) instead of
the in-memory SQLite test DB, failing with "relation does not exist".

Scoped rebind in conftest: swap only `app.api.sse.SessionLocal`, not
the production sessionmaker, so other modules importing the production
`SessionLocal` aren't disturbed. Caught by 4 `test_sse_security.py`
failures in PR #354 backend CI.
Consolidate byte-identical logic copy-pasted across the LLM adapters into
single sources of truth:

- new adapters/_shared.py: parse_retry_after, raise_for_status (with optional
  throttle_detector for Bedrock's 400-throttle path), extract_api_key,
  extract_fixed_base_credentials
- tool_translation.py: content_to_text + to_anthropic_messages (shared by the
  Anthropic SDK adapter and Bedrock's anthropic family), _validate_force, and
  one table-driven _normalise_finish_reason replacing four near-identical
  per-provider normalizers
- removed dead _normalise_role passthrough in _httpx_openai

Behavior-preserving: error messages, exception types, retry-after values, and
JSON shapes unchanged. Anthropic's SDK-based _raise_for_status and Gemini's
str()-wrapping message translator left intact (distinct semantics).
Net -106 LOC across the llm tree; 2429 backend tests pass, coverage 87.78%.
Collapse repeated boilerplate across the connector API + storage layer into
shared helpers — single source of truth, behavior-preserving:

- api/llm.py: _get_owned_connector_or_404 (6 call sites), _audit_and_return
  (the audit -> commit -> refresh -> ConnectorOut epilogue, 4 endpoints),
  _raise_if_duplicate_label (the uq_dj_connector_label -> 409 mapping, 2 sites)
- admin_llm.py: _connector_to_admin_out for the AdminConnectorOut column
  reflection (list + revoke)
- connector_storage.py: _APIKEY_CONNECTOR_TYPES constant, _normalise_model_hint
  and _validate_api_key_blob shared by build_create_payload + rotate_credentials
  + update_metadata

Every ValueError/HTTPException message preserved verbatim (the create/rotate/
update paths phrase model_hint + api_key errors differently, so those strings
stay caller-supplied). Skipped the ConnectorCreate/Rotate credential-field
mixin: Pydantic v2 reorders inherited fields, which would churn the generated
OpenAPI schema + api-types.generated.ts.

Net -5 LOC; api/llm.py -16. 2429 backend tests pass, coverage 87.31%.
)

Behavior-preserving DRY cleanup of the two AI dashboard surfaces:

- AiProvidersSection.tsx: replace two nested-ternary placeholder chains with
  API_KEY_PLACEHOLDERS / MODEL_HINT_PLACEHOLDERS lookup tables; collapse 15
  per-field onChange closures into a single handleField(key) factory
  (connector_type stays inline — it needs the LlmConnectorType cast).
- admin/ai/page.tsx: add formatTimestamp() for the 3 toLocaleString cells;
  reuse the existing PlainHeader component in the Usage + Audit tables instead
  of repeating the inline <th> style; extract onAuditFilterChange() for the 4
  filter handlers that all reset pagination to page 0.

Rendered DOM + behavior unchanged. Skipped the PROVIDER_FIELDS form-block
data-driving: the bedrock/azure/apikey/compatible blocks genuinely diverge
(help text, Hermes <details>, select-vs-input model hint), so a descriptor
system would cost as much as it saves and risk shifting the DOM the component
tests query.

966 frontend tests pass; lint + tsc clean; coverage above thresholds.
Behavior-preserving test contraction — same case counts, no assertions dropped:

- test_llm_url_validator.py: 26 single-assert methods → 2 @parametrize tables
  (accept/normalise + reject) + keep the case-normalization test standalone
  (its startswith/endswith assertions don't pin netloc casing).
- test_llm_bedrock_adapter.py: TestBedrockErrorMapping's 8 methods → one
  parametrized status→exception test (7 cases) + keep the 429 test separate
  (it additionally asserts retry_after_seconds). _err lifted to module scope so
  parametrize args resolve at collection time.
- test_llm_gateway.py: _patch_chat() helper replaces the verbose
  patch.object(__import__("...openai_apikey", ...).OpenAIApiKeyAdapter, "chat")
  block (×15) via a plain top-level import; _make_org_default() dedupes the
  org-owner + connector + wire-default setup (×6).

Deliberately NOT collapsed (verifier-flagged coverage risk): the cross-adapter
error matrix in test_llm_adapters.py, the api happy-path/policy tests (two
distinct setup mechanisms), and the per-provider tool-translation tests — those
are assertion-divergent and parametrizing would drop distinct cases.

Net -226 LOC. 2429 backend tests pass (unchanged count), coverage 87.94%.
* feat(llm): add ChatResponseChunk + streaming ABC default

Adds ToolCallDelta / ChatResponseChunk canonical models alongside
ChatResponse, a StreamingUnsupported typed error, and a non-abstract
LlmAdapter.stream() default that raises it so non-streaming adapters
degrade gracefully.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): OpenAI-wire streaming generator + event parser

Adds streaming.parse_openai_stream_event (text + partial tool-call JSON
fragments + terminal usage) and _httpx_openai.stream_openai_chat, an
async generator that streams Chat Completions SSE and maps non-2xx
statuses to canonical typed errors. Consumer disconnect closes the
httpx stream context, cancelling the upstream request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): streaming for OpenAI platform + compatible adapters

Both adapters delegate to stream_openai_chat, preserving the
max_completion_tokens field override on the Platform adapter and the
SSRF-validated base URL on the compatible adapter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): Anthropic provider-native streaming (text + tool_use deltas)

Consumes the anthropic SDK messages.stream() typed event stream:
text_delta blocks become text chunks, tool_use content_block_start +
input_json_delta blocks become partial tool-call fragments grouped by
content-block index. The terminal chunk carries the canonical
stop_reason and completion token count. Shared _build_kwargs keeps the
chat and stream request translation in lockstep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): Gateway.stream with counts-only logging + cancellation

Adds Gateway.stream mirroring dispatch connector resolution. A single
counts-only llm_call_log row (plus auth audit on 401) is written in
_attempt_stream's finally, so it fires on success, typed error, AND
consumer cancellation. The outer generator explicitly aclose()s the
inner one so client disconnect synchronously closes the upstream
connection. Auto-fallback is intentionally disabled for streaming
(chunks already delivered — a restart would corrupt output).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(llm): backend lint/format fix-ups for streaming endpoint

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(llm): cover unknown OpenAI finish_reason → error mapping

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ai-ui): SSE stream consumer for connector stream-test

apiClient.streamConnectorTest uses fetch + ReadableStream (EventSource
cannot send the Bearer header), parses text/event-stream data frames,
and invokes onChunk per frame. Accepts an AbortSignal so callers can
cancel — aborting closes the connection, which the backend treats as a
client disconnect and cancels the upstream provider request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ai-ui): live Stream test button on DJ AI providers section

Wires the streamConnectorTest consumer into the DJ-owned connector rows
(AiProvidersSection) — the correct surface, since the stream-test
endpoint is DJ-scoped and 404s for connectors the caller doesn't own.
Streamed text_delta accumulates into an italic live-text line under the
connector. The admin /admin/ai page is intentionally NOT wired: it lists
other DJs' connectors the admin doesn't own.

Recommendation UI migration to SSE is deferred (background pipeline
returns a final JSON payload, not a token feed) — the reusable SSE
plumbing lands here per issue #335's 'reasonable judgment on scope'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm-stream): address CodeRabbit review on PR #379

- api.ts: parse SSE `event:` field so `event: error` frames surface as a
  thrown ApiError instead of passing through as an inert chunk.
- _httpx_openai: raise ToolTranslationError on a malformed `data:` SSE JSON
  frame instead of silently dropping it (keepalive/comment frames start with
  ":" and never reach this branch).
- anthropic stream: canonicalise stop_reason via the shared
  normalise_anthropic_stop_reason() helper so stream and chat agree on
  pause_turn / refusal / unknown values (-> "error", not "end_turn").
- streaming: guard int() coercion of provider-supplied index / token counts
  via _as_int() so null/non-numeric fields don't abort an otherwise-usable
  stream.
- Regression tests for all four (referencing #379).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(llm): add LlmFeaturePreference model + migration 050

New table maps (user_id, feature) -> connector_id with a UNIQUE constraint
so each DJ has at most one pinned connector per agentic feature. Both FKs
cascade on delete. Feature allowlist (recommendation, set_builder) lives in
the model module.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): feature-preference CRUD helpers in connector_storage

set_feature_preference upserts (replace-in-place), clear returns whether a
row was removed, plus list/get helpers — all caller-commits.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): gateway resolves per-feature pin first, falls back gracefully

_resolve_connector now takes the dispatch purpose and checks the DJ's
LlmFeaturePreference for that feature before the per-DJ default / MRU /
org-default chain. A pinned connector that was deleted or is no longer
active is skipped so a stale/broken pin never silently breaks the DJ.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): feature-preference API schemas

FeatureKey Literal (OpenAPI enum), FeaturePreferenceOut / ...ListOut /
...Set request+response models.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): set/clear/list feature-preference endpoints

GET/POST /api/llm/feature-preferences and DELETE
/api/llm/feature-preferences/{feature}, scoped to the current DJ and
rate-limited like the existing connector endpoints. POST validates
connector ownership (404, not leaking another DJ's connectors) and rejects
pinning a non-active connector (400). The {feature} path param is the
FeatureKey Literal, so unknown features 422 at the boundary. Adds a guard
test that FeatureKey stays in sync with KNOWN_FEATURES.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ai-ui): api client methods + types for feature preferences

Regenerated OpenAPI types (FeaturePreferenceOut/...ListOut/...Set), added
LlmFeaturePreference* re-exports, and listLlmFeaturePreferences /
setLlmFeaturePreference / clearLlmFeaturePreference on the ApiClient.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ai-ui): per-feature defaults section on DJ AI settings

Adds a 'Per-feature defaults' section to AiProvidersSection: one connector
picker per known feature (Recommendations, Set builder). Selecting a
connector pins it; 'Use account default' clears the pin. Only active
connectors are offered. The section fetches soft and hides itself if the
preferences endpoint is unavailable.

Refs #337

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): make set_feature_preference a conflict-safe upsert

Two concurrent pins for the same (user_id, feature) could both miss the
existing-row check and race to INSERT; the loser tripped the
uq_llm_feature_pref_user_feature UNIQUE constraint, bubbling a 500.

Replace the read-then-insert with a DB-native ON CONFLICT DO UPDATE
upsert (dialect-aware: postgresql in prod, sqlite in tests) so a
concurrent re-pin resolves deterministically (last writer wins) in a
single atomic statement. Adds a regression test pinning the behavior.

Addresses CodeRabbit review on PR #378.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(llm): per-DJ monthly token cap + gateway pre-flight enforcement

Add monthly_token_cap column (NULL = unlimited) on LlmConnector with
migration 050, a current-month token-usage aggregation helper, and a
gateway dispatch pre-flight check that raises the new QuotaCapReached
exception before any provider call. Cap edits never disrupt in-flight
calls. Direct aggregation over the indexed llm_call_log.created_at
column — no rollup table needed at current volume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(llm): admin cap endpoint + expose usage + 429 DJ-facing error

Add PATCH /api/admin/llm/connectors/{id}/cap (admin-only) to set/clear a
connector's monthly token cap, with Pydantic ge=0 validation and a
defensive service-layer guard. Expose monthly_token_cap +
current_month_tokens on the admin connector listing for the usage-vs-cap
bar. Map QuotaCapReached to a 429 with the fixed DJ-facing message in the
/recommendations/llm endpoint (no internal leakage). Cap changes reuse the
policy_changed audit event.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ai-ui): admin per-DJ monthly cap input + usage-vs-cap progress bar

Add a Monthly cap column to the admin AI per-DJ connectors table: an
editable cap input (empty = unlimited), current-month usage text, and a
colour-escalating progress bar (green < 80% < amber < 100% red). Wire
setAdminLlmConnectorCap into the api client + regenerate OpenAPI types.
Update test fixtures for the new monthly_token_cap / current_month_tokens
fields.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): batch month-usage query + require cap field in patch (#377)

Address CodeRabbit review on PR #377:

- Batch current-month usage lookup in the admin connector list endpoint:
  add current_month_token_usage_bulk() — one grouped aggregate over
  llm_call_log instead of an N+1 per-connector query.
- Make monthly_token_cap required in AdminConnectorCapPatch (Field(...)):
  an empty {} body is now rejected with 422 rather than silently treated
  as null, which would wipe a configured cap. Frontend already sends the
  field explicitly (null to clear), so no client change needed.
- Regenerate openapi.json + dashboard api-types.generated.ts.
- Add regression tests: bulk-usage aggregation/parity/empty-input, and
  empty-body PATCH rejection leaving the cap untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(merge): regenerate openapi + api-types after epic merge

git auto-merge of the generated openapi.json / api-types.generated.ts
dropped endpoints from one side (streaming stream-test + per-feature
preference). Regenerated from the FastAPI app to restore the full spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(merge): add monthly_token_cap to per-feature test fixture

#339 added monthly_token_cap (required) to ConnectorOut; the per-feature
test fixture from #337 predated it. current_month_tokens stays admin-only
(AdminConnectorOut), so it's not added here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve 4 conflicts + linearize the alembic migration fork:

- server/app/api/sse.py: keep epic's pool-safety `with SessionLocal()` block
  (#356/#366 — don't pin a pooled connection for the stream lifetime) while
  adopting main's dual-code resolver `get_event_by_public_code_with_status`
  (#382/#383 — accepts collection OR join code, fixes SSE channel-key bug).
- account/page.tsx: both sides appended a card; keep BOTH as separate cards —
  AiProvidersSection (#357) and the Guest Experience frictionless toggle (#380).
- account/page.test.tsx: hang the AI-section stubs on the shared hoisted
  `mockApi` object instead of spreading, so `vi.spyOn(mockApi, ...)` keeps
  rebinding the same reference the page calls.
- api.test.ts: keep both appended suites — `streamConnectorTest` (#379) and
  `frictionless join api` (#369).
- a11334c031bb migration: re-parent down_revision 045 -> 051. main branched the
  frictionless migration off 045 while epic built 046..051 on 045, creating two
  alembic heads. Columns don't overlap the AI tables, so ordering after 051 is
  safe. `alembic upgrade head` + `alembic check` pass on Postgres.
@thewrz

thewrz commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

Heads-up on what changed since your last review: main was merged into epic/ai-engine (merge commit). Please focus on the 4 conflict resolutions and the migration re-parent:

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and read the relevant files
echo "=== server/app/api/sse.py ==="
cat server/app/api/sse.py

echo ""
echo "=== dashboard/app/(dj)/account/page.tsx ==="
cat "dashboard/app/(dj)/account/page.tsx"

echo ""
echo "=== Alembic migration a11334c ==="
fd -e py . server/alembic/versions/ | xargs grep -l "a11334c\|down_revision.*051\|down_revision.*045" 2>/dev/null | head -5

echo ""
echo "=== All recent alembic migrations ==="
ls -la server/alembic/versions/ | tail -20

Length of output: 16424


🏁 Script executed:

#!/bin/bash
echo "=== a11334c migration (re-parented) ==="
cat server/alembic/versions/a11334c031bb_add_frictionless_join_flags.py

echo ""
echo "=== 5bb46508476b migration ==="
cat "server/alembic/versions/5bb46508476b_add_now_playing_and_play_history_tables.py"

echo ""
echo "=== account page test ==="
cat "dashboard/app/(dj)/account/__tests__/page.test.tsx"

echo ""
echo "=== lib/__tests__/api.test.ts (last 150 lines to see LLM additions) ==="
tail -150 "dashboard/lib/__tests__/api.test.ts"

echo ""
echo "=== alembic env.py to check head config ==="
grep -n "head\|run_migrations\|include_schemas" server/alembic/env.py | head -20

Length of output: 18113



@thewrz All four conflict resolutions and the migration re-parent look good. Details below.


server/app/api/sse.py

Clean merge. The pool-safety and dual-code resolver changes coexist correctly:

  • with SessionLocal() as db: is entered and exited before EventSourceResponse is returned — the pooled connection is released before the long-lived stream begins. ✅
  • event_code = event.code is extracted as a plain string inside the block, so no ORM object escapes the session (no risk of DetachedInstanceError). ✅
  • get_event_by_public_code_with_status covers all three result variants (NOT_FOUND → 404, ARCHIVED/EXPIRED → 410). ✅
  • The docstring in _event_generator documenting the no-session-held contract is a nice guard for future contributors.

dashboard/app/(dj)/account/page.tsx

Both cards are present and ordered correctly: Change Password → Change Email → AiProvidersSectionGuest Experience. No bleed between the two branches.


✅ Test mocks (account/page.test.tsx + lib/__tests__/api.test.ts)

  • mockApi correctly stubs listLlmConnectors (→ []) and getLlmPolicy (→ rejects), giving the AI providers section a fail-closed render in all account-page tests. The new "renders the relocated AI / Model providers section" test confirms the heading survives. ✅
  • api.test.tsstreamConnectorTest covers happy-path SSE, non-OK HTTP error, and the event: error frame case (#379 — verified it rejects the promise and still delivers leading chunks). frictionless join api block covers getJoinConfig, both ensureGuestName variants, and updateMyPreferences PATCH. ✅

server/alembic/versions/a11334c031bb_… re-parent

down_revision = "051" — correctly placed at the tip of the AI engine chain (045 → 046 → … → 051 → a11334c031bb). The two-head fork is collapsed. ✅


⚠️ Pre-existing two-head risk: 5bb46508476b_add_now_playing_and_play_history_tables.py

This file (likely brought in from main via the merge commit) has:

down_revision: str | None = "003"

That branches off revision 003 — far back in the chain — and is never re-joined, creating a second Alembic head alongside the a11334c031bb tip. Running alembic upgrade head will raise an error; you'd need alembic upgrade heads or a merge migration (alembic merge). This isn't introduced by this PR's resolutions, but it's now surfaced in the branch and will block clean deploys if left unaddressed. Worth a quick alembic heads check and a merge migration before landing.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@thewrz

thewrz commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai Thanks for the thorough pass — the four conflict resolutions and the migration re-parent confirmations are appreciated.

On the ⚠️ 5bb46508476b second-head concern: this is a false positive. That revision is not a head — it's consumed by the very next migration:

$ alembic heads
a11334c031bb (head)        # single head

# 5bb46508476b is re-joined immediately:
003           -> 5bb46508476b   (add now_playing + play_history)
5bb46508476b  -> 004            (004_add_now_playing_visibility.py: down_revision = "5bb46508476b")
004           -> ...            (linear from here)

004_add_now_playing_visibility.py sets down_revision = "5bb46508476b", so the chain is linear 003 → 5bb46508476b → 004 → … → 051 → a11334c031bb. It's a long-standing rev-003-era migration, not introduced by this merge.

Empirically confirmed against Postgres before pushing:

alembic upgrade head   # ran cleanly through 051 -> a11334c031bb
alembic check          # "No new upgrade operations detected" (no model drift)

So alembic upgrade head does not error and there is a single head. No merge migration needed. Resolving as not-an-issue.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@thewrz Thanks for the clear correction and for backing it up with the empirical alembic heads / alembic check output. You're right — I only inspected 5bb46508476b in isolation and missed that 004_add_now_playing_visibility.py immediately consumes it as down_revision, keeping the chain linear. That was an analysis gap on my end. The single-head / no-merge-migration conclusion stands. Marking this resolved.

(ᵕ—ᴗ—)


✏️ Learnings added
Learnt from: thewrz
URL: https://github.com/wrzonance/WrzDJ/pull/354

Timestamp: 2026-06-03T05:25:56.696Z
Learning: In the wrzonance/WrzDJ repo, `server/alembic/versions/5bb46508476b_add_now_playing_and_play_history_tables.py` has `down_revision = "003"` but is NOT a second Alembic head. It is immediately consumed by `004_add_now_playing_visibility.py` which sets `down_revision = "5bb46508476b"`, making the full chain linear: `003 → 5bb46508476b → 004 → … → 051 → a11334c031bb` (single head). Do not flag `5bb46508476b` as a dangling or second-head migration in future reviews.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: thewrz
Repo: wrzonance/WrzDJ PR: 377
File: server/app/services/llm/gateway.py:99-101
Timestamp: 2026-05-29T05:53:37.669Z
Learning: In `server/app/services/llm/gateway.py`, the monthly-token-cap pre-flight check in `_enforce_monthly_cap` is deliberately a read-then-dispatch pattern (not atomic) by design. The cap is a *soft monthly budget guardrail* (not a hard per-request billing limit): token counts (`tokens_in`/`tokens_out`) are only known post-hoc after the provider responds and are logged to `llm_call_log`, so there is no quantity to atomically reserve before dispatch. Concurrency exposure is negligible because enforcement is per-DJ-connector and the only caller is the recommendation engine (one LLM call per DJ-initiated request, not a fan-out). The worst case is a small one-call overshoot that self-corrects on the next call. The pre-flight-only pattern also satisfies the issue `#339` acceptance criterion: "editing caps must not disrupt in-flight calls." Do not flag this as a TOCTOU issue in future reviews — it is a tracked and intentional design tradeoff.

Learnt from: thewrz
Repo: wrzonance/WrzDJ PR: 354
File: server/app/services/recommendation/llm_client.py:289-307
Timestamp: 2026-05-25T23:00:13.086Z
Learning: In `server/app/services/recommendation/llm_client.py`, when `db` is present the `call_llm` function intentionally does NOT fall back to the legacy env-var Anthropic key if `Gateway.dispatch` raises `NoLlmConfigured`. The `is_llm_available(db, actor)` check upstream already gates the path and returns an HTTP 503 when no connector is configured, prompting the DJ/admin to set one up. This is a deliberate design decision: per-DJ/opt-in LLM keys are a core goal; silently reusing a shared org Anthropic env-var key would contradict it. The `_legacy_call` path is reserved exclusively for `db=None` callers (e.g., unit tests that mock `AsyncAnthropic` directly).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
dashboard/app/(dj)/account/page.tsx (1)

46-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Prevent the initial getMe() response from clobbering an edited preference.

If the user flips the checkbox before this request settles, the late setFrictionlessDefault(user.frictionless_join_default) can revert the UI back to the stale server value. Track whether the toggle has been edited locally, or wait for the initial preference load before enabling the control.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard/app/`(dj)/account/page.tsx around lines 46 - 54, The initial
api.getMe() handler can overwrite a user toggle change made locally; add a local
flag (e.g., hasEditedFrictionless or frictionlessInitialized) and update
useEffect to only call setFrictionlessDefault(user.frictionless_join_default)
when the control has not been edited yet (or when not initialized), and update
that flag in the onChange handler for the toggle to prevent later API responses
from clobbering the local change; locate the logic around useEffect, api.getMe,
setFrictionlessDefault and the toggle's onChange to implement this guard.
dashboard/app/(dj)/account/__tests__/page.test.tsx (1)

171-179: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep this getMe fixture aligned with the current response shape.

This override drops frictionless_join_default, so this test is now exercising a payload shape AccountPage no longer gets from api.getMe(). Add the field here too so the pending-email path still covers the real contract.

💡 Suggested fix
     mockGetMe.mockResolvedValue({
       id: 1,
       username: 'testuser',
       role: 'dj',
       help_pages_seen: [],
       pending_email: 'pending@example.com',
       email: null,
+      frictionless_join_default: false,
     });

Based on learnings: When adding fields to shared TypeScript interfaces (e.g., PublicRequestInfo), update test fixtures in test files that construct those types.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard/app/`(dj)/account/__tests__/page.test.tsx around lines 171 - 179,
The test overrides mockGetMe to return a payload missing the new field; update
the fixture returned by mockGetMe in the test ('shows pending email from getMe
on load') to include frictionless_join_default (matching the shape returned by
api.getMe/PublicRequestInfo) so AccountPage receives the real contract; locate
the mockGetMe.mockResolvedValue call and add a frictionless_join_default
property with an appropriate value alongside pending_email and other fields.
server/app/services/llm/adapters/gemini_apikey.py (1)

135-146: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Fix Gemini tool response mapping (functionResponse.name must be the function name, not the call id)

Gemini requires functionResponse.name to match the declared function name (FunctionDeclaration/FunctionCall name), and uses a separate functionResponse.id for the call id. The current adapter sets "name": m.tool_call_id, but the canonical Message only provides an opaque tool_call_id (no tool/function name), so follow-up turns are likely to break when id ≠ function name. Adjust translation to use the tool/function name for name and keep the opaque id in functionResponse.id (using ToolCall.name/ToolCall.id in the canonical model). [critical mismatch with Gemini contract]

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/adapters/gemini_apikey.py` around lines 135 - 146,
The adapter currently maps tool messages by setting functionResponse.name to the
opaque tool_call_id (see m.role == "tool" branch and m.tool_call_id), but Gemini
expects functionResponse.name to be the declared function name and
functionResponse.id to hold the call id; update the translation to use the
canonical ToolCall.name for functionResponse.name and place m.tool_call_id into
functionResponse.id (or use ToolCall.id if present) so the output uses function
name for name and the opaque call id for id (adjust any references in the m ->
ToolCall mapping logic accordingly).
♻️ Duplicate comments (1)
server/app/services/llm/tool_translation.py (1)

247-259: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate tool_use.input before coercing it.

dict(input_obj or {}) still lets malformed Anthropic payloads slip through: [] becomes {}, while primitives can leak a raw TypeError. That breaks the typed error contract and can silently run a tool with missing arguments.

💡 Proposed fix
         elif btype == "tool_use":
             if isinstance(block, dict):
                 name = block.get("name")
                 tool_id = block.get("id") or name
                 input_obj = block.get("input")
             else:
                 name = getattr(block, "name", None)
                 tool_id = getattr(block, "id", None) or name
                 input_obj = getattr(block, "input", None)
             if not name or not tool_id:
                 raise ToolTranslationError("Anthropic tool_use block missing id/name")
+            if input_obj is None:
+                input_dict: dict = {}
+            elif isinstance(input_obj, dict):
+                input_dict = input_obj
+            else:
+                raise ToolTranslationError("Anthropic tool_use input must be an object")
             tool_calls.append(
-                ToolCall(id=str(tool_id), name=str(name), input=dict(input_obj or {}))
+                ToolCall(id=str(tool_id), name=str(name), input=input_dict)
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/tool_translation.py` around lines 247 - 259, The code
currently coerces tool inputs with dict(input_obj or {}), which lets
lists/primitives slip through; update the block handling around input_obj (used
to construct ToolCall) to explicitly validate that input_obj is either None or a
mapping before coercion: if input_obj is None treat as {}, if
isinstance(input_obj, dict) use dict(input_obj), otherwise raise
ToolTranslationError with a clear message (mention the offending tool id/name)
so malformed Anthropic payloads (e.g. lists/primitives) fail fast instead of
producing TypeError or silently becoming {} when constructing
ToolCall(id=str(tool_id), name=str(name), input=...).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dashboard/lib/api.ts`:
- Around line 1272-1297: The SSE parser in streamConnectorTest currently only
handles LF-only framing (uses buffer.indexOf('\n\n') and frame.split('\n')), so
CRLF-delimited frames (`\r\n` / `\r\n\r\n`) can be missed; fix by normalizing
line endings and boundary detection: when reading into buffer (and before
splitting frames) replace CRLF with LF (e.g., buffer = buffer.replace(/\r\n/g,
'\n')) or detect both separators (check for '\r\n\r\n' and '\n\n'), and split
frame lines with a CRLF-tolerant pattern (e.g., frame.split(/\r?\n/)); update
references in streamConnectorTest where buffer.indexOf, frame.split, eventType,
and dataLines are used so `event: error` frames are correctly parsed.

In `@server/alembic/versions/051_llm_connector_monthly_token_cap.py`:
- Around line 26-34: The migration allows negative monthly_token_cap values; add
a DB-level check constraint in upgrade() to enforce monthly_token_cap >= 0
(e.g., create a constraint named like
ck_llm_connectors_monthly_token_cap_nonnegative via op.create_check_constraint
on table "llm_connectors"), and in downgrade() drop that constraint
(op.drop_constraint) before dropping the monthly_token_cap column so the DB
won't accept negative caps and the rollback removes the constraint cleanly.

In `@server/app/services/llm/adapters/_httpx_openai.py`:
- Around line 206-223: The loop currently JSON-decodes each "data:" line
individually which breaks SSE events that span multiple data lines; change the
logic in the resp.aiter_lines iteration (the block around the async for line in
resp.aiter_lines()) to accumulate successive "data:" lines into a single event
buffer (concatenate their payloads preserving newlines) until you hit the SSE
event delimiter (a blank line) or a non-"data:" line, then handle the special
"[DONE]" sentinel and only call json.loads + parse_openai_stream_event on the
fully assembled event string; on JSON decode errors raise ToolTranslationError
as before but only for complete events so partial fragments don't prematurely
raise.

In `@server/app/services/llm/adapters/anthropic_apikey.py`:
- Around line 183-240: _translate_anthropic_event currently only uses
getattr(...) so plain-dict events drop through; update its attribute accesses to
fall back to dict-key lookup when getattr returns None (e.g., read etype =
getattr(event, "type", None) or event.get("type") if event is a dict), do the
same for content_block/block fields (block = getattr(event, "content_block",
None) or block = block.get(...) ), delta, usage, index (accept
event.get("index") too) and the inner fields like delta["text"] /
delta.get("partial_json") as fallbacks; preserve existing behavior for SDK
objects and keep conversions (int(...) for index/output_tokens) and the same
ChatResponseChunk and ToolCallDelta construction.

In `@server/app/services/llm/streaming.py`:
- Around line 70-85: The parser currently returns None for events lacking
text/tool deltas even when they are terminal usage-only events; move or compute
usage_payload before the early-return and change the guard so it only returns
None when there is no text_delta, no tool_call_deltas, not done, and no
usage_payload. Specifically, keep the existing logic that reads finish_reason
and sets done/stop_reason via _FINISH_REASON_OPENAI, but ensure usage_payload =
payload.get("usage") or {} is evaluated before the if-not-text/tool/done check
and include a check for usage_payload (and populate usage via TokenUsage and
_as_int as shown) so usage-only terminal events are not discarded.

---

Outside diff comments:
In `@dashboard/app/`(dj)/account/__tests__/page.test.tsx:
- Around line 171-179: The test overrides mockGetMe to return a payload missing
the new field; update the fixture returned by mockGetMe in the test ('shows
pending email from getMe on load') to include frictionless_join_default
(matching the shape returned by api.getMe/PublicRequestInfo) so AccountPage
receives the real contract; locate the mockGetMe.mockResolvedValue call and add
a frictionless_join_default property with an appropriate value alongside
pending_email and other fields.

In `@dashboard/app/`(dj)/account/page.tsx:
- Around line 46-54: The initial api.getMe() handler can overwrite a user toggle
change made locally; add a local flag (e.g., hasEditedFrictionless or
frictionlessInitialized) and update useEffect to only call
setFrictionlessDefault(user.frictionless_join_default) when the control has not
been edited yet (or when not initialized), and update that flag in the onChange
handler for the toggle to prevent later API responses from clobbering the local
change; locate the logic around useEffect, api.getMe, setFrictionlessDefault and
the toggle's onChange to implement this guard.

In `@server/app/services/llm/adapters/gemini_apikey.py`:
- Around line 135-146: The adapter currently maps tool messages by setting
functionResponse.name to the opaque tool_call_id (see m.role == "tool" branch
and m.tool_call_id), but Gemini expects functionResponse.name to be the declared
function name and functionResponse.id to hold the call id; update the
translation to use the canonical ToolCall.name for functionResponse.name and
place m.tool_call_id into functionResponse.id (or use ToolCall.id if present) so
the output uses function name for name and the opaque call id for id (adjust any
references in the m -> ToolCall mapping logic accordingly).

---

Duplicate comments:
In `@server/app/services/llm/tool_translation.py`:
- Around line 247-259: The code currently coerces tool inputs with
dict(input_obj or {}), which lets lists/primitives slip through; update the
block handling around input_obj (used to construct ToolCall) to explicitly
validate that input_obj is either None or a mapping before coercion: if
input_obj is None treat as {}, if isinstance(input_obj, dict) use
dict(input_obj), otherwise raise ToolTranslationError with a clear message
(mention the offending tool id/name) so malformed Anthropic payloads (e.g.
lists/primitives) fail fast instead of producing TypeError or silently becoming
{} when constructing ToolCall(id=str(tool_id), name=str(name), input=...).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e1e3f4b7-6317-4f69-8688-fc95c528980c

📥 Commits

Reviewing files that changed from the base of the PR and between fd7e650 and 5a8b858.

📒 Files selected for processing (91)
  • .env.example
  • .github/workflows/ci.yml
  • CLAUDE.md
  • dashboard/app/(dj)/account/__tests__/page.test.tsx
  • dashboard/app/(dj)/account/page.tsx
  • dashboard/app/admin/ai/__tests__/page.test.tsx
  • dashboard/app/admin/ai/page.tsx
  • dashboard/components/AiProvidersSection.tsx
  • dashboard/components/__tests__/AiProvidersSection.featurePrefs.test.tsx
  • dashboard/components/__tests__/AiProvidersSection.test.tsx
  • dashboard/lib/__tests__/api.test.ts
  • dashboard/lib/api-types.generated.ts
  • dashboard/lib/api-types.ts
  • dashboard/lib/api.ts
  • dashboard/next.config.js
  • docs/LLM-PLUGIN.md
  • docs/examples/echo_adapter.py
  • docs/superpowers/plans/2026-05-26-move-dj-ai-settings-to-account.md
  • docs/superpowers/plans/2026-05-26-remove-deprecated-anthropic-env-reads.md
  • docs/superpowers/plans/2026-05-26-sse-stream-pooled-db-connection-leak.md
  • docs/superpowers/plans/2026-05-28-llm-cost-quota-caps.md
  • docs/superpowers/plans/2026-05-28-llm-gateway-streaming.md
  • docs/superpowers/plans/2026-05-28-per-feature-connector-preference.md
  • server/alembic/versions/047_llm_call_log_retention.py
  • server/alembic/versions/048_llm_connector_is_default.py
  • server/alembic/versions/049_llm_connector_health_check_columns.py
  • server/alembic/versions/050_llm_feature_preference.py
  • server/alembic/versions/051_llm_connector_monthly_token_cap.py
  • server/alembic/versions/a11334c031bb_add_frictionless_join_flags.py
  • server/app/api/admin_llm.py
  • server/app/api/events.py
  • server/app/api/llm.py
  • server/app/api/sse.py
  • server/app/core/config.py
  • server/app/core/csv_safe.py
  • server/app/main.py
  • server/app/models/__init__.py
  • server/app/models/llm_connector.py
  • server/app/models/llm_feature_preference.py
  • server/app/models/system_settings.py
  • server/app/schemas/llm.py
  • server/app/services/email_sender.py
  • server/app/services/llm/__init__.py
  • server/app/services/llm/adapters/_httpx_openai.py
  • server/app/services/llm/adapters/_shared.py
  • server/app/services/llm/adapters/anthropic_apikey.py
  • server/app/services/llm/adapters/azure_openai.py
  • server/app/services/llm/adapters/bedrock.py
  • server/app/services/llm/adapters/gemini_apikey.py
  • server/app/services/llm/adapters/openai_apikey.py
  • server/app/services/llm/adapters/openai_compatible.py
  • server/app/services/llm/adapters/openrouter_apikey.py
  • server/app/services/llm/adapters/xai_apikey.py
  • server/app/services/llm/base.py
  • server/app/services/llm/connector_storage.py
  • server/app/services/llm/exceptions.py
  • server/app/services/llm/gateway.py
  • server/app/services/llm/health_check.py
  • server/app/services/llm/health_monitor.py
  • server/app/services/llm/plugin_loader.py
  • server/app/services/llm/registry.py
  • server/app/services/llm/streaming.py
  • server/app/services/llm/tool_translation.py
  • server/app/services/recommendation/llm_client.py
  • server/app/services/recommendation/llm_hooks.py
  • server/app/services/system_settings.py
  • server/openapi.json
  • server/tests/conftest.py
  • server/tests/test_export.py
  • server/tests/test_llm_adapter_contract.py
  • server/tests/test_llm_adapters.py
  • server/tests/test_llm_api.py
  • server/tests/test_llm_bedrock_adapter.py
  • server/tests/test_llm_call_log_retention.py
  • server/tests/test_llm_client.py
  • server/tests/test_llm_default_connector.py
  • server/tests/test_llm_feature_preference.py
  • server/tests/test_llm_gateway.py
  • server/tests/test_llm_gateway_stream.py
  • server/tests/test_llm_health_check.py
  • server/tests/test_llm_health_monitor.py
  • server/tests/test_llm_hooks.py
  • server/tests/test_llm_plugin_loader.py
  • server/tests/test_llm_quota_cap.py
  • server/tests/test_llm_recommendation_via_gateway.py
  • server/tests/test_llm_stream_endpoint.py
  • server/tests/test_llm_streaming.py
  • server/tests/test_llm_tool_translation.py
  • server/tests/test_llm_url_validator.py
  • server/tests/test_sse_pool.py
  • server/tests/test_verify_status_endpoint.py
✅ Files skipped from review due to trivial changes (3)
  • server/alembic/versions/049_llm_connector_health_check_columns.py
  • docs/superpowers/plans/2026-05-26-remove-deprecated-anthropic-env-reads.md
  • dashboard/lib/api-types.generated.ts
🚧 Files skipped from review as they are similar to previous changes (12)
  • server/app/services/llm/exceptions.py
  • server/app/services/llm/init.py
  • server/app/models/init.py
  • server/app/services/system_settings.py
  • server/app/models/system_settings.py
  • server/app/services/llm/registry.py
  • CLAUDE.md
  • server/app/services/llm/adapters/azure_openai.py
  • server/app/services/llm/adapters/xai_apikey.py
  • server/app/core/csv_safe.py
  • server/app/api/events.py
  • dashboard/app/admin/ai/page.tsx

Comment thread dashboard/lib/api.ts
Comment on lines +1272 to +1297
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
try {
for (;;) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames are separated by a blank line.
let sep: number;
while ((sep = buffer.indexOf('\n\n')) !== -1) {
const frame = buffer.slice(0, sep);
buffer = buffer.slice(sep + 2);
// A frame may carry an `event:` name plus one or more `data:` lines.
// The backend emits `event: error` for typed gateway failures, so we
// must inspect the event type — not just blindly parse `data:`.
let eventType = 'message';
const dataLines: string[] = [];
for (const line of frame.split('\n')) {
if (line.startsWith('event:')) {
eventType = line.slice('event:'.length).trim();
} else if (line.startsWith('data:')) {
dataLines.push(line.slice('data:'.length).trim());
}
}
const data = dataLines.join('\n').trim();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the relevant function/snippet
rg -n "streamConnectorTest|SSE|event:|data:" dashboard/lib/api.ts

# Print the exact region around the referenced lines
sed -n '1220,1360p' dashboard/lib/api.ts | nl -ba

# Also check for any other SSE parsing helpers/usages in the file
rg -n "getReader\\(\\)|TextDecoder\\(|indexOf\\(\"\\\\n\\\\n\"\\)|\\\\r\\\\n" dashboard/lib/api.ts

Repository: wrzonance/WrzDJ

Length of output: 1960


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the exact region around streamConnectorTest
sed -n '1210,1335p' dashboard/lib/api.ts | cat -n

echo "----"

# Search for CRLF-related code in this file
rg -n "\\\\r\\\\n|\\\\r\\\\r|\\r\\n|\\r" dashboard/lib/api.ts || true

Repository: wrzonance/WrzDJ

Length of output: 5685


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "indexOf\\('\\\\n\\\\n'\\)|split\\(/\\\\r\\?\\\\n/\\)|split\\('\\\\n'\\)" dashboard/lib/api.ts dashboard -S || true

Repository: wrzonance/WrzDJ

Length of output: 375


Handle CRLF-delimited SSE frames in streamConnectorTest()

The parser only detects frame boundaries via buffer.indexOf('\n\n') and splits lines with frame.split('\n'). For standard SSE \r\n\r\n / \r\n framing, the boundary match can fail, leaving frames unparsed (including event: error).

💡 Suggested fix
     const reader = response.body.getReader();
     const decoder = new TextDecoder();
     let buffer = '';
+    const frameBoundary = /\r?\n\r?\n/;
     try {
       for (;;) {
         const { done, value } = await reader.read();
         if (done) break;
         buffer += decoder.decode(value, { stream: true });
         // SSE frames are separated by a blank line.
         let sep: number;
-        while ((sep = buffer.indexOf('\n\n')) !== -1) {
+        while ((sep = buffer.search(frameBoundary)) !== -1) {
           const frame = buffer.slice(0, sep);
-          buffer = buffer.slice(sep + 2);
+          const boundary = buffer.slice(sep).match(frameBoundary)?.[0] ?? '\n\n';
+          buffer = buffer.slice(sep + boundary.length);
           // A frame may carry an `event:` name plus one or more `data:` lines.
           // The backend emits `event: error` for typed gateway failures, so we
           // must inspect the event type — not just blindly parse `data:`.
           let eventType = 'message';
           const dataLines: string[] = [];
-          for (const line of frame.split('\n')) {
+          for (const line of frame.split(/\r?\n/)) {
             if (line.startsWith('event:')) {
               eventType = line.slice('event:'.length).trim();
             } else if (line.startsWith('data:')) {
               dataLines.push(line.slice('data:'.length).trim());
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
try {
for (;;) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames are separated by a blank line.
let sep: number;
while ((sep = buffer.indexOf('\n\n')) !== -1) {
const frame = buffer.slice(0, sep);
buffer = buffer.slice(sep + 2);
// A frame may carry an `event:` name plus one or more `data:` lines.
// The backend emits `event: error` for typed gateway failures, so we
// must inspect the event type — not just blindly parse `data:`.
let eventType = 'message';
const dataLines: string[] = [];
for (const line of frame.split('\n')) {
if (line.startsWith('event:')) {
eventType = line.slice('event:'.length).trim();
} else if (line.startsWith('data:')) {
dataLines.push(line.slice('data:'.length).trim());
}
}
const data = dataLines.join('\n').trim();
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
const frameBoundary = /\r?\n\r?\n/;
try {
for (;;) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames are separated by a blank line.
let sep: number;
while ((sep = buffer.search(frameBoundary)) !== -1) {
const frame = buffer.slice(0, sep);
const boundary = buffer.slice(sep).match(frameBoundary)?.[0] ?? '\n\n';
buffer = buffer.slice(sep + boundary.length);
// A frame may carry an `event:` name plus one or more `data:` lines.
// The backend emits `event: error` for typed gateway failures, so we
// must inspect the event type — not just blindly parse `data:`.
let eventType = 'message';
const dataLines: string[] = [];
for (const line of frame.split(/\r?\n/)) {
if (line.startsWith('event:')) {
eventType = line.slice('event:'.length).trim();
} else if (line.startsWith('data:')) {
dataLines.push(line.slice('data:'.length).trim());
}
}
const data = dataLines.join('\n').trim();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard/lib/api.ts` around lines 1272 - 1297, The SSE parser in
streamConnectorTest currently only handles LF-only framing (uses
buffer.indexOf('\n\n') and frame.split('\n')), so CRLF-delimited frames (`\r\n`
/ `\r\n\r\n`) can be missed; fix by normalizing line endings and boundary
detection: when reading into buffer (and before splitting frames) replace CRLF
with LF (e.g., buffer = buffer.replace(/\r\n/g, '\n')) or detect both separators
(check for '\r\n\r\n' and '\n\n'), and split frame lines with a CRLF-tolerant
pattern (e.g., frame.split(/\r?\n/)); update references in streamConnectorTest
where buffer.indexOf, frame.split, eventType, and dataLines are used so `event:
error` frames are correctly parsed.

Comment on lines +26 to +34
def upgrade() -> None:
op.add_column(
"llm_connectors",
sa.Column("monthly_token_cap", sa.Integer(), nullable=True),
)


def downgrade() -> None:
op.drop_column("llm_connectors", "monthly_token_cap")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Enforce non-negative caps in the migration.

monthly_token_cap currently accepts negative integers. A bad write here will make the connector effectively "always over budget" once the gateway compares current usage against the stored cap. Add a DB check so invalid values cannot persist.

💡 Proposed fix
 def upgrade() -> None:
     op.add_column(
         "llm_connectors",
         sa.Column("monthly_token_cap", sa.Integer(), nullable=True),
     )
+    op.create_check_constraint(
+        "ck_llm_connectors_monthly_token_cap_nonnegative",
+        "llm_connectors",
+        "monthly_token_cap IS NULL OR monthly_token_cap >= 0",
+    )
 
 
 def downgrade() -> None:
+    op.drop_constraint(
+        "ck_llm_connectors_monthly_token_cap_nonnegative",
+        "llm_connectors",
+        type_="check",
+    )
     op.drop_column("llm_connectors", "monthly_token_cap")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/alembic/versions/051_llm_connector_monthly_token_cap.py` around lines
26 - 34, The migration allows negative monthly_token_cap values; add a DB-level
check constraint in upgrade() to enforce monthly_token_cap >= 0 (e.g., create a
constraint named like ck_llm_connectors_monthly_token_cap_nonnegative via
op.create_check_constraint on table "llm_connectors"), and in downgrade() drop
that constraint (op.drop_constraint) before dropping the monthly_token_cap
column so the DB won't accept negative caps and the rollback removes the
constraint cleanly.

Comment on lines +206 to +223
async for line in resp.aiter_lines():
if not line or not line.startswith("data:"):
continue
data = line[len("data:") :].strip()
if data == "[DONE]":
break
try:
obj = json.loads(data)
except json.JSONDecodeError as exc:
# SSE comment / keepalive frames start with ":" (or are
# blank) and never reach here — they fail the "data:"
# prefix check above. A "data:" line that isn't "[DONE]"
# yet won't parse is a genuine protocol fault; surface it
# rather than silently truncating the stream.
raise ToolTranslationError("Upstream returned malformed SSE JSON") from exc
chunk = parse_openai_stream_event(obj)
if chunk is not None:
yield chunk

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parse SSE events instead of individual data: lines.

This loop JSON-decodes every data: line separately. SSE frames are event-delimited, and a compliant OpenAI-compatible server may split one event across multiple data: lines. In that case this will raise ToolTranslationError on a partial fragment and truncate an otherwise valid stream.

💡 Proposed fix
-                async for line in resp.aiter_lines():
-                    if not line or not line.startswith("data:"):
-                        continue
-                    data = line[len("data:") :].strip()
-                    if data == "[DONE]":
-                        break
-                    try:
-                        obj = json.loads(data)
-                    except json.JSONDecodeError as exc:
-                        # SSE comment / keepalive frames start with ":" (or are
-                        # blank) and never reach here — they fail the "data:"
-                        # prefix check above. A "data:" line that isn't "[DONE]"
-                        # yet won't parse is a genuine protocol fault; surface it
-                        # rather than silently truncating the stream.
-                        raise ToolTranslationError("Upstream returned malformed SSE JSON") from exc
-                    chunk = parse_openai_stream_event(obj)
-                    if chunk is not None:
-                        yield chunk
+                event_data: list[str] = []
+                async for line in resp.aiter_lines():
+                    if line.startswith(":"):
+                        continue
+                    if line.startswith("data:"):
+                        event_data.append(line[len("data:") :].lstrip())
+                        continue
+                    if line != "":
+                        continue
+
+                    if not event_data:
+                        continue
+                    data = "\n".join(event_data).strip()
+                    event_data.clear()
+                    if data == "[DONE]":
+                        break
+                    try:
+                        obj = json.loads(data)
+                    except json.JSONDecodeError as exc:
+                        raise ToolTranslationError("Upstream returned malformed SSE JSON") from exc
+                    chunk = parse_openai_stream_event(obj)
+                    if chunk is not None:
+                        yield chunk
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async for line in resp.aiter_lines():
if not line or not line.startswith("data:"):
continue
data = line[len("data:") :].strip()
if data == "[DONE]":
break
try:
obj = json.loads(data)
except json.JSONDecodeError as exc:
# SSE comment / keepalive frames start with ":" (or are
# blank) and never reach here — they fail the "data:"
# prefix check above. A "data:" line that isn't "[DONE]"
# yet won't parse is a genuine protocol fault; surface it
# rather than silently truncating the stream.
raise ToolTranslationError("Upstream returned malformed SSE JSON") from exc
chunk = parse_openai_stream_event(obj)
if chunk is not None:
yield chunk
event_data: list[str] = []
async for line in resp.aiter_lines():
if line.startswith(":"):
continue
if line.startswith("data:"):
event_data.append(line[len("data:") :].lstrip())
continue
if line != "":
continue
if not event_data:
continue
data = "\n".join(event_data).strip()
event_data.clear()
if data == "[DONE]":
break
try:
obj = json.loads(data)
except json.JSONDecodeError as exc:
raise ToolTranslationError("Upstream returned malformed SSE JSON") from exc
chunk = parse_openai_stream_event(obj)
if chunk is not None:
yield chunk
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/adapters/_httpx_openai.py` around lines 206 - 223,
The loop currently JSON-decodes each "data:" line individually which breaks SSE
events that span multiple data lines; change the logic in the resp.aiter_lines
iteration (the block around the async for line in resp.aiter_lines()) to
accumulate successive "data:" lines into a single event buffer (concatenate
their payloads preserving newlines) until you hit the SSE event delimiter (a
blank line) or a non-"data:" line, then handle the special "[DONE]" sentinel and
only call json.loads + parse_openai_stream_event on the fully assembled event
string; on JSON decode errors raise ToolTranslationError as before but only for
complete events so partial fragments don't prematurely raise.

Comment on lines +183 to +240
def _translate_anthropic_event(
event: Any,
) -> tuple[ChatResponseChunk | None, bool, str | None, int | None]:
"""Translate one Anthropic SDK stream event into stream state.

Returns ``(chunk_or_None, saw_tool_use, stop_reason_or_None,
output_tokens_or_None)``. Mirrors the dual dict/object access style of
``tool_translation.parse_anthropic_response`` so it tolerates either the
typed SDK events or plain dicts (used in tests).
"""
etype = getattr(event, "type", None)

if etype == "content_block_start":
block = getattr(event, "content_block", None)
if getattr(block, "type", None) == "tool_use":
idx = int(getattr(event, "index", 0))
chunk = ChatResponseChunk(
tool_call_deltas=[
ToolCallDelta(
index=idx,
id=getattr(block, "id", None),
name=getattr(block, "name", None),
)
]
)
return chunk, True, None, None
return None, False, None, None

if etype == "content_block_delta":
delta = getattr(event, "delta", None)
dtype = getattr(delta, "type", None)
if dtype == "text_delta":
return ChatResponseChunk(text_delta=getattr(delta, "text", "") or ""), False, None, None
if dtype == "input_json_delta":
idx = int(getattr(event, "index", 0))
chunk = ChatResponseChunk(
tool_call_deltas=[
ToolCallDelta(
index=idx,
input_json_fragment=getattr(delta, "partial_json", "") or "",
)
]
)
return chunk, False, None, None
return None, False, None, None

if etype == "message_delta":
delta = getattr(event, "delta", None)
stop_reason = getattr(delta, "stop_reason", None)
usage = getattr(event, "usage", None)
output_tokens = None
if usage is not None:
ot = getattr(usage, "output_tokens", None)
if ot is not None:
output_tokens = int(ot)
return None, False, stop_reason, output_tokens

return None, False, None, None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle dict-backed stream events here too.

_translate_anthropic_event() says it accepts SDK objects or plain dicts, but this branch only uses getattr(...). For dict fixtures, etype, delta, content_block, and usage all resolve to None, so text/tool deltas and stop metadata get dropped silently.

💡 Minimal fix
+def _field(obj: Any, name: str) -> Any:
+    return obj.get(name) if isinstance(obj, dict) else getattr(obj, name, None)
+
 def _translate_anthropic_event(
     event: Any,
 ) -> tuple[ChatResponseChunk | None, bool, str | None, int | None]:
@@
-    etype = getattr(event, "type", None)
+    etype = _field(event, "type")
@@
-        block = getattr(event, "content_block", None)
-        if getattr(block, "type", None) == "tool_use":
-            idx = int(getattr(event, "index", 0))
+        block = _field(event, "content_block")
+        if _field(block, "type") == "tool_use":
+            idx = int(_field(event, "index") or 0)
             chunk = ChatResponseChunk(
                 tool_call_deltas=[
                     ToolCallDelta(
                         index=idx,
-                        id=getattr(block, "id", None),
-                        name=getattr(block, "name", None),
+                        id=_field(block, "id"),
+                        name=_field(block, "name"),
                     )
                 ]
             )
@@
-        delta = getattr(event, "delta", None)
-        dtype = getattr(delta, "type", None)
+        delta = _field(event, "delta")
+        dtype = _field(delta, "type")
         if dtype == "text_delta":
-            return ChatResponseChunk(text_delta=getattr(delta, "text", "") or ""), False, None, None
+            return ChatResponseChunk(text_delta=_field(delta, "text") or ""), False, None, None
         if dtype == "input_json_delta":
-            idx = int(getattr(event, "index", 0))
+            idx = int(_field(event, "index") or 0)
             chunk = ChatResponseChunk(
                 tool_call_deltas=[
                     ToolCallDelta(
                         index=idx,
-                        input_json_fragment=getattr(delta, "partial_json", "") or "",
+                        input_json_fragment=_field(delta, "partial_json") or "",
                     )
                 ]
             )
@@
-        delta = getattr(event, "delta", None)
-        stop_reason = getattr(delta, "stop_reason", None)
-        usage = getattr(event, "usage", None)
+        delta = _field(event, "delta")
+        stop_reason = _field(delta, "stop_reason")
+        usage = _field(event, "usage")
         output_tokens = None
         if usage is not None:
-            ot = getattr(usage, "output_tokens", None)
+            ot = _field(usage, "output_tokens")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/adapters/anthropic_apikey.py` around lines 183 - 240,
_translate_anthropic_event currently only uses getattr(...) so plain-dict events
drop through; update its attribute accesses to fall back to dict-key lookup when
getattr returns None (e.g., read etype = getattr(event, "type", None) or
event.get("type") if event is a dict), do the same for content_block/block
fields (block = getattr(event, "content_block", None) or block = block.get(...)
), delta, usage, index (accept event.get("index") too) and the inner fields like
delta["text"] / delta.get("partial_json") as fallbacks; preserve existing
behavior for SDK objects and keep conversions (int(...) for index/output_tokens)
and the same ChatResponseChunk and ToolCallDelta construction.

Comment on lines +70 to +85
finish_reason = choice.get("finish_reason")
done = finish_reason is not None

stop_reason: CanonicalStopReason | None = None
usage: TokenUsage | None = None
if done:
stop_reason = _FINISH_REASON_OPENAI.get(finish_reason, "error")
usage_payload = payload.get("usage") or {}
if usage_payload:
usage = TokenUsage(
prompt=_as_int(usage_payload.get("prompt_tokens", 0)),
completion=_as_int(usage_payload.get("completion_tokens", 0)),
)

if not text_delta and not tool_call_deltas and not done:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't discard usage-only terminal events.

When include_usage is enabled, the final OpenAI-style stream payload can carry only usage metadata and no text/tool delta. This parser returns None for that event because done is driven only by finish_reason, so streamed calls lose token accounting.

💡 Proposed fix
-    stop_reason: CanonicalStopReason | None = None
-    usage: TokenUsage | None = None
-    if done:
-        stop_reason = _FINISH_REASON_OPENAI.get(finish_reason, "error")
-        usage_payload = payload.get("usage") or {}
-        if usage_payload:
-            usage = TokenUsage(
-                prompt=_as_int(usage_payload.get("prompt_tokens", 0)),
-                completion=_as_int(usage_payload.get("completion_tokens", 0)),
-            )
+    stop_reason: CanonicalStopReason | None = None
+    usage: TokenUsage | None = None
+    if done:
+        stop_reason = _FINISH_REASON_OPENAI.get(finish_reason, "error")
+    usage_payload = payload.get("usage") or {}
+    if usage_payload:
+        usage = TokenUsage(
+            prompt=_as_int(usage_payload.get("prompt_tokens", 0)),
+            completion=_as_int(usage_payload.get("completion_tokens", 0)),
+        )
 
-    if not text_delta and not tool_call_deltas and not done:
+    if not text_delta and not tool_call_deltas and not done and usage is None:
         return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/app/services/llm/streaming.py` around lines 70 - 85, The parser
currently returns None for events lacking text/tool deltas even when they are
terminal usage-only events; move or compute usage_payload before the
early-return and change the guard so it only returns None when there is no
text_delta, no tool_call_deltas, not done, and no usage_payload. Specifically,
keep the existing logic that reads finish_reason and sets done/stop_reason via
_FINISH_REASON_OPENAI, but ensure usage_payload = payload.get("usage") or {} is
evaluated before the if-not-text/tool/done check and include a check for
usage_payload (and populate usage via TokenUsage and _as_int as shown) so
usage-only terminal events are not discarded.

thewrz and others added 2 commits June 2, 2026 23:00
…igrations

The PR #354 conflict resolution anchored a11334c031bb (frictionless join,
already applied to prod via main) ABOVE 046-051. Prod's alembic_version
reads a11334c031bb, so after merge 'alembic upgrade head' would see
current == head and silently skip all six AI-engine migrations — no llm
tables would ever be created in production.

Fix: restore a11334c031bb to down_revision=045 (byte-identical to main's
copy) and re-anchor 046 onto a11334c031bb. New chain:
045 -> a11334c031bb -> 046 -> ... -> 051 (head).

Validated on PostgreSQL 16:
- prod-path simulation: main chain to a11334c031bb, then epic chain
  applies 046-051 and creates llm_connectors/llm_call_log/
  llm_audit_event/llm_feature_preferences
- fresh-DB full chain passes, single head (051), alembic check clean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AI Engine] MVP — provider-agnostic LLM gateway + 3 connectors + recommendation migration

1 participant