Skip to content

Main 3#27

Merged
raphaeltm merged 600 commits into
mainfrom
main-3
May 21, 2026
Merged

Main 3#27
raphaeltm merged 600 commits into
mainfrom
main-3

Conversation

@raphaeltm

Copy link
Copy Markdown
Collaborator

No description provided.

raphaeltm and others added 30 commits May 11, 2026 14:42
Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
…tainer-cache-experiments-01krb4

Experiment with Cloudflare devcontainer cache backends
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Four production fixes: disable unattended-upgrades on VMs, deduplicate
workspace creation race condition, extract task callback route to fix
401 auth, implement MCP token sliding window with 8h TTL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tm#966)

* fix: extract task callback route before projectsRoutes

The task callback route (POST /:projectId/tasks/:taskId/status/callback)
was blocked by projectsRoutes.use('/*', requireAuth()) which leaks session
auth middleware to all sibling subrouters at the same base path. The leaked
requireAuth() ran BEFORE the callback route's own verifyCallbackToken JWT
auth, rejecting the VM agent's Bearer token request with 401.

Fix: Extract the callback route into its own Hono subrouter (callback.ts)
and mount it at /api/projects BEFORE projectsRoutes in index.ts, following
the same pattern used for deploymentIdentityTokenRoute and
nodeAcpHeartbeatRoute.

This is the fourth instance of the Hono middleware scope leak bug class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add MCP token sliding window refresh + 8h TTL

MCP tokens had a 4-hour TTL with no refresh mechanism. Agents running
tasks longer than 4 hours lost MCP tool access permanently.

Changes:
- Increase DEFAULT_MCP_TOKEN_TTL_SECONDS from 4h to 8h (inactivity timeout)
- Add DEFAULT_MCP_TOKEN_MAX_LIFETIME_SECONDS (24h hard cap)
- Add sliding window to validateMcpToken(): refresh KV TTL on each use,
  throttled to >50% of TTL elapsed to avoid excessive KV writes
- Add lastRefreshedAt field to McpTokenData for throttle tracking
- Fail-closed: malformed createdAt causes immediate token revocation
- Add MCP_TOKEN_MAX_LIFETIME_SECONDS env var (configurable per principle XI)
- Pass env through to all validateMcpToken call sites

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add callback auth routing + MCP sliding window tests

- Integration test proving task callback accepts Bearer JWT through
  combined app routes (not blocked by session auth middleware)
- Unit tests for MCP token sliding window: throttle, max lifetime,
  capped TTL, fail-closed on malformed createdAt
- Fix existing mcp-token test for new max-lifetime behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add task callback middleware leak post-mortem + update .env.example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: move task to active

* fix: update test fixtures for MCP token max-lifetime validation

MCP token sliding window adds a createdAt-based max lifetime check.
Test fixtures using hardcoded dates from months ago now exceed the 24h
cap. Update all MCP token fixtures to use new Date().toISOString().

Also update task-runner-completion source contract test to read from
callback.ts instead of crud.ts after route extraction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: archive completed task

* fix: address security review findings

- Enforce expectedScope: 'workspace' on task callback JWT verification
  (prevents node-scoped tokens from reaching workspace-mutation endpoint)
- Return updatedData from validateMcpToken after sliding window refresh
  (callers now receive current state matching what is persisted in KV)
- Add non-atomicity comment to sliding window refresh (documents known
  KV race condition consistent with checkMcpRateLimit pattern)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make KV delete best-effort in token expiry paths

kv.delete() on expired/malformed tokens is a cleanup courtesy — the KV
TTL will expire the entry anyway. Making it fire-and-forget prevents a
KV service hiccup from turning a token expiry into an unhandled 500.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…aphaeltm#968)

* fix: disable apt-daily timers and harden IPv6 firewall in cloud-init

- Disable apt-daily.timer, apt-daily-upgrade.timer, and
  unattended-upgrades in cloud-init runcmd before vm-agent starts.
  These Ubuntu timers can trigger systemd daemon-reexec which kills the
  vm-agent mid-work. Ephemeral VMs gain nothing from auto-upgrades.

- Load ip6_tables kernel module before ip6tables commands in the
  firewall script. Some Hetzner images ship without the module loaded,
  causing all ip6tables commands to fail silently. The IPv6 firewall
  block is now conditional — if the module can't be loaded, IPv6 rules
  are skipped with a log warning instead of failing the entire script.

- Make ip6tables-save error-tolerant for systems without IPv6 support.

- Add 5 new tests covering timer disables and IPv6 module loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: move cloud-init firewall hygiene task to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: address review findings — conditionality test, YAML-parsed timer tests

- Add test verifying ip6tables DROP/ACCEPT rules are inside the modprobe
  conditional block, not executed unconditionally (MEDIUM finding)
- Refactor timer ordering tests to use YAML.parse instead of string splitting
- Update stale ip6tables-save assertion to match current contract

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add retry policy for transient Hetzner capacity failures (422)

Hetzner VM provisioning can return HTTP 422 when capacity is temporarily
exhausted for a server type/region. These are now retried with bounded
exponential backoff (15s initial, 2min max, 5 attempts default) while
permanent 422s (invalid config) are thrown immediately.

- Add isTransientCapacityError() to classify 422s by message pattern
- Wrap placement loop with capacity retry in createVM()
- Log retry attempts with server_type, location, attempt#, delay
- Distinguish "capacity exhausted" from "invalid configuration" errors
- All retry params configurable via constructor/HetznerProviderConfig
- Add comprehensive tests for retry, backoff, exhaustion, and logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire capacity retry env vars to API layer

Add HETZNER_CAPACITY_RETRY_INITIAL_DELAY_MS, HETZNER_CAPACITY_RETRY_MAX_DELAY_MS,
and HETZNER_CAPACITY_RETRY_MAX_ATTEMPTS to Env interface and buildProviderConfig.
Operators can now tune retry behavior without code changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: address test-engineer review findings for capacity retry

- Add tests for untested regex variants (resources temporarily unavailable,
  resource unavailable, could not find)
- Add maxAttempts=1 edge case test
- Add .cause assertion on capacity exhaustion error
- Add mixed 412+422 scenario test
- Add assertion that console.warn is NOT emitted on final exhaustion attempt
- Clean up double-call pattern in non-capacity 422 test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: reduce test duplication in capacity retry tests

Extract helper functions (capacityErrorResponse, placementErrorResponse,
successResponse, mockAlwaysCapacityError) to eliminate repeated Response
construction patterns flagged by SonarCloud duplication check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Auto-committed by SAM on agent completion.
raphaeltm#971)

Add two tests to the task callback auth routing regression suite:
- Invalid Bearer token is handled by callback auth (not session auth)
- Workspace ID mismatch returns 403

These tests verify the callback route's own auth gates work correctly,
complementing the existing regression tests that verify session auth
middleware doesn't leak onto callback routes.

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Auto-committed by SAM on agent completion.
When a task fails, the session can stay "Active" because the
stopSession RPC to the ProjectData DO is best-effort and can
fail silently. The UI should cross-reference task status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
)

* chore: move task to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add priority/updates/all tabs to notification panel

Replace the existing All/Unread tabs with three new tabs:
- Priority: shows needs_input and task_complete notifications
- Updates: shows progress, error, session_ended, pr_created
- All: shows everything (unchanged behavior)

The Priority tab is the default, helping users quickly find
agent input requests and completed tasks without scrolling
through status updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Playwright visual audit for notification panel tabs

Covers Priority/Updates/All tab filtering, empty states, long text,
many items, and multi-project grouping at mobile (375x667) and
desktop (1280x800) viewports. Asserts no horizontal overflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: archive notification-panel-priority-tabs task

* test: add bell badge vs priority badge distinction test

Adds a unit test verifying the bell icon shows total unread count (4)
while the Priority tab badge shows only priority unread count (2).
Also checks off completed task file items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add ARIA tab semantics, updates badge, and touch targets

- Add role="tablist", role="tab", aria-selected, aria-controls to
  notification filter tabs for screen reader accessibility
- Add role="tabpanel" with aria-labelledby to notification list
- Add arrow key navigation between tabs
- Add updatesUnreadCount badge to Updates tab
- Increase tab touch targets to 44px minimum (min-h-[44px])
- Update test selectors from getByRole('button') to getByRole('tab')

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add 99+ badge test, desktop All tab and Updates empty state tests

Addresses test-engineer review findings:
- Unit test for 99+ badge truncation branch
- Playwright desktop All tab test with overflow assertion
- Playwright desktop Updates empty state test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: trigger CI re-run for preflight markers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…eltm#974)

* chore: move task to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: reconcile session state when task reaches terminal status

Three-layer fix for sessions staying "Active" when tasks fail:

1. UI: getSessionState() and isActiveSession() now cross-reference
   task.status — if the task is failed/completed/cancelled, the session
   shows as terminated regardless of the session DO status.

2. Backend: Add failSession() (distinct from stopSession) so sessions
   record failure explicitly. failTask() now calls failSession with
   a single retry before giving up.

3. Handle 'failed' session status in the UI alongside 'stopped'.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update test fixture to reflect active session with in-progress task

The test for "does not show fork button for active sessions" had a
session with status='active' but task.status='failed'. With the new
task-status cross-referencing, this is correctly treated as terminated.
Updated to use task.status='in_progress' for a truly active session.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: archive task file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use failSession in crud.ts task-failure path + update docs

Address review findings:
- crud.ts task status update route now calls failSession() instead of
  stopSession() when transitioning to 'failed' status
- Added 'failed' session status to workspace-lifecycle.md docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address cloudflare specialist review findings

HIGH fixes:
- isTerminated in parseChatSessionListRow now includes 'failed' status
- session.failed added to SESSION_LIFECYCLE_EVENTS for sidebar refresh
- useChatWebSocket handles session.failed alongside session.stopped
- failSession() uses cursor.rowsWritten to skip duplicate events

MEDIUM fixes:
- ActivityFeed formats session.failed events with error message
- Retry sleep reduced from 1000ms to 100ms (less DO event loop blocking)
- Removed unused _errorMessage param from sessions.failSession()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add missing tests from test-engineer review

- Priority ordering: task terminal takes precedence over idle/agentCompleted
- isActiveSession: direct failed status path
- Fork button reconciliation: session active + task failed shows fork button

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: extract shared terminateSession helper to reduce duplication

Reduces code duplication between stopSession and failSession flagged by
SonarCloud (30.1% duplication on new code).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: use it.each to reduce test duplication

Consolidates structurally similar getSessionState and isActiveSession
tests into parameterized test tables to reduce SonarCloud duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ltm#976)

* blur prototype

* feat: add blur overlay, speed and noise size controls to WebGL background

- Add u_scale uniform to shader for configurable noise granularity
- Add speed/noiseSize options to useWebGLBackground hook (defaults: 0.4x, 1.02)
- Add blur+dim overlay div to ProjectAgentChat and SamPrototype (10px blur, 0.10 dim)
- Update prototype with Speed and Noise size sliders, defaults matching preferred settings

Settings: Blur 10px, Dim 0.10, Green 1.00, Speed 0.4x, Noise size 1.02

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: retrigger checks with updated PR body

* ci: retrigger with proper preflight markers

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: keep cancel prompt sessions follow-up ready

* chore: refresh PR evidence checks

* fix: restart opencode after prompt cancel

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
)

* task: move DO-only chat task to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: DO-only chat architecture with typewriter animation

Remove the direct ACP WebSocket connection from WorkspaceChatView.
Route ALL messages through the Durable Object WebSocket (single source).
Send prompts via REST API (POST /sessions/:sessionId/prompt).

- Add TypewriterText component for word-by-word animation of batched content
- Derive agent state (idle/prompting/responding) from message flow
- Remove useProjectAgentSession import from WorkspaceChatView
- Remove dual-source conversationItems merge (was causing React raphaeltm#185)
- Un-deprecate sendFollowUpPrompt REST API helper

The hook file useProjectAgentSession.ts stays in the codebase — it's
still used by ProjectMessageView via useSessionLifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct export ordering in acp-client index

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* task: archive DO-only chat task

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review findings — idle stuck state and a11y

- Reset agentActivity to 'idle' when sendFollowUpPrompt REST call fails,
  preventing the input from being stuck in "Agent is working..." forever
- Transition to 'responding' on any assistant message (not just from
  'prompting'), handling reconnect with in-progress agent output
- Add prefers-reduced-motion guard to TypewriterText (WCAG 2.1 SC 2.3.3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: convert ProjectMessageView to DO-only chat architecture

Remove ACP WebSocket from ProjectMessageView, matching the DO-only
architecture already applied to WorkspaceChatView. All messages now flow
through the Durable Object WebSocket; prompts are sent via REST API.

- Simplify useSessionLifecycle: remove useProjectAgentSession, derive
  agent activity state (idle/prompting/responding) from message flow
- Simplify useConnectionRecovery: remove 6-mechanism ACP recovery,
  keep DO WebSocket reconnection and idle resume via REST API
- Wire follow-up prompts through sendFollowUpPrompt REST API
- Integrate TypewriterText for latest assistant message animation
- Remove AgentErrorBanner (depended on ACP session types)
- Remove ACP-specific tests (cancel button, ACP connecting, agent
  offline banner, DO+ACP merge, scroll position stability)
- Update resume tests to remove ACP sendPrompt/reconnect assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* task: move restore cancel button to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: restore cancel button in agent working indicator

PR raphaeltm#978 removed the ACP WebSocket from ProjectMessageView, which also
removed the cancel button because it relied on sending session/cancel
over that WebSocket. The VM agent already has a REST cancel endpoint
and the API service function exists — this commit wires them together.

- Add POST /sessions/:sessionId/cancel API route in chat.ts
- Add cancelAgentPrompt() client API function
- Add handleCancelPrompt to useSessionLifecycle hook
- Restore cancel button in ProjectMessageView working indicator
- Add cancel button to WorkspaceChatView working indicator
- Add integration tests for the cancel API route

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: sort exports alphabetically in api/index.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* task: archive restore cancel button task

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address cloudflare specialist review findings

- Add userId filter to agentSessions query for defence-in-depth (HIGH)
- Add double-tap guard (cancellingRef) to prevent duplicate cancel requests
- Don't clear agentActivity on catch — keep spinner visible on real errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add behavioral test for cancel button in ProjectMessageView

Renders the component, triggers 'responding' state via WebSocket message,
clicks the Cancel button, and asserts cancelAgentPrompt was called with
the correct project/session IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve cancel button accessibility and touch targets

- Add role="status" to agent working indicator bar for screen readers
- Add aria-label="Cancel agent" to both cancel buttons
- Increase touch target to min-h-[44px] for mobile usability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
simple-agent-manager Bot and others added 29 commits May 19, 2026 14:37
* Harden Go CLI quality gates

* Document CLI quality gate gap

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
…wn blur, update Mistral models (raphaeltm#1069)

- Remove sam-glass-card-motion and glass-card-glow from Card glass variant
  so glass cards no longer behave like buttons (hover scale + active press)
- Add backdrop-blur-xl to ModelSelect dropdown for proper glassmorphic blur
- Update Mistral model catalog with latest IDs: Medium 3.5, Small 4,
  Large 3, Medium 3.1, Devstral 2, Codestral, Magistral Medium 1.2,
  Ministral 3 (14B/8B/3B)

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
… models (raphaeltm#1071)

* feat: dynamic Vibe model config + fix Mistral API model IDs

Generate a dynamic [[models]] entry in Vibe's config.toml when the user
selects a model that isn't a built-in alias. This uses the raw Mistral
API model ID as both the TOML alias and API name, so the UI model
catalog can list any Mistral model without requiring vm-agent changes.

Also fixes model catalog IDs to match actual Mistral API identifiers
(e.g. mistral-medium-3-5-2604, not mistral-medium-3-5-26-04).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: consolidate Vibe config tests into table-driven test to reduce duplication

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* task: move fix-amp-agent-cli-install to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: install @sourcegraph/amp CLI alongside acp-amp bridge

The acp-amp Python package is only the ACP bridge wrapper. The actual
amp CLI binary (@sourcegraph/amp npm package) must also be installed
for acp-amp to function. Chain npm install after uv install in the
amp agent install command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: update amp install command assertion in shared tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* task: archive fix-amp-agent-cli-install

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: drop which-amp guard and add install script passthrough test

Address go-specialist review findings:
- Remove conditional `which amp` guard from install command — npm install -g
  is idempotent, so the guard only creates a partial-install trap
- Add TestAgentInstallScriptAmpPassesThrough to verify the non-npm code path
  leaves the amp install command unchanged
- Add comment on isNpmBased: false explaining the implicit npm dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: set amp isNpmBased=true so Node.js bootstrap runs in containers

The amp install command chains `npm install -g @sourcegraph/amp` after
the uv install of acp-amp. With isNpmBased=false, the agentInstallScript
function skipped the Node.js bootstrap preamble, causing `npm: not found`
in devcontainers that don't ship with Node.js pre-installed.

Setting isNpmBased=true ensures the preamble installs nodejs/npm when
missing before the install command runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Auto-committed by SAM on agent completion.
PR raphaeltm#1068 set go-version to 1.25.0 which doesn't exist, breaking all
CI runs since the workflow file fails validation before any jobs start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
PR raphaeltm#1068 added a duplicate `cli:` key in the changes job outputs,
causing YAML parse failure and breaking all CI runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The cli filter was defined twice in the paths-filter config — once with
basic paths and once with additional paths (sonar, rules). Merged into
a single entry to fix the YAML duplicate key error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. model-catalog.test.ts: devstral model ID was 'devstral-2512' but
   catalog has 'devstral-2-2512', and it's in groups[1] not groups[0].
   Fixed to search all groups with flatMap.

2. packages/cli/go.mod: specified go 1.25.0 which doesn't exist yet,
   causing 'no such tool "covdata"' error. Changed to go 1.24.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… agent support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: explicit SAM provider selection for Claude Code and Codex

Users must now explicitly opt-in to SAM as an AI provider for Claude Code
and OpenAI Codex agents. No more silent platform proxy fallback.

- Add AgentProviderMode type ('sam' | 'user-api-key' | 'oauth') to shared types
- Add provider_mode column to agent_settings table (migration 0054)
- Gate platform proxy on providerMode === 'sam' in runtime.ts (no-silent-fallback)
- Update agent catalog to show configured status based on providerMode
- Add admin AI allowance API (GET/PUT/DELETE per user) for ceilings
- Enforce admin allowance ceilings in user budget validation
- Add provider mode selector UI in AgentSettingsCard for claude-code/codex
- Add tests for budget ceiling enforcement and agent status display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* task: move explicit SAM provider selection active

* test: fix SAM provider status import order

* test: fix SAM provider API import order

* fix: require explicit SAM provider only for Claude and Codex

* docs: record explicit SAM provider workflow

* test: cover explicit SAM provider workflow

* docs: sync explicit SAM provider setup

* refactor: reduce AI allowance route duplication

* refactor: share AI budget limit helpers

* refactor: consolidate AI budget limits

* refactor: simplify provider budget helpers

* refactor: simplify AI budget parser

* refactor: inline AI budget limit parsing

* fix: route explicit SAM providers through proxy

* fix: use responses wire api for Codex proxy

* fix: proxy OpenAI responses API for Codex

* refactor: share AI proxy request guards

* fix: satisfy ai proxy import order

* fix: add auth to /models endpoint, update CLAUDE.md for provider modes

- Add prepareAIProxyRequest() gate to GET /ai/v1/models — previously
  unauthenticated, leaking the list of allowed models to any caller
- Update CLAUDE.md Agent Authentication section for three-mode system
  (user-api-key, oauth, sam) and add explicit-sam-provider-selection
  to Recent Changes
- Document monthly cost cap fail-open window with risk context
- File backlog task for allowedModelTiers enforcement at proxy gate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…phaeltm#1084)

* feat: add gpt-5.4-mini and gpt-5.4 to platform AI proxy allowed models

Users selecting gpt-5.4-mini from the model dropdown got:
"Model 'gpt-5.4-mini' is not available"

The model catalog (model-catalog.ts) listed gpt-5.4-mini as a
selectable option for Codex, but the platform AI proxy allowlist
(PLATFORM_AI_MODELS in ai-services.ts) didn't include it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: sync PLATFORM_AI_MODELS with all dropdown models for Claude Code and Codex

Adds all 14 missing models from the UI model catalog dropdown to the
platform AI models allowlist. Previously only 6 OpenAI and 3 Anthropic
models were in the allowlist — users selecting any other model from the
dropdown would get a "model not allowed" error when using SAM provider mode.

Added Anthropic: claude-opus-4-7, claude-sonnet-4-5-20250514,
claude-sonnet-4-20250514, claude-3-5-sonnet-20241022,
claude-3-5-haiku-20241022, claude-3-opus-20240229

Added OpenAI: gpt-5.5-pro, gpt-5.5, gpt-5.3-codex, gpt-5.2-codex,
gpt-5.1-codex-max, gpt-5.1-codex-mini, o4-mini, o3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: sync model catalog with current API offerings

Update both the UI dropdown (model-catalog.ts) and the platform AI
proxy allowlist (ai-services.ts) to match what's actually available
from the Anthropic and OpenAI APIs as of May 2026.

Anthropic changes:
- Fix claude-sonnet-4-5 ID: 20250514 → 20250929 (correct dated version)
- Remove retired models: claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022,
  claude-3-opus-20240229 (all no longer available via API)
- Add claude-opus-4-5-20251101, claude-opus-4-1-20250805 (available legacy)
- Fix context windows: Opus 4.7/4.6 and Sonnet 4.6 are 1M tokens, not 200k

OpenAI changes:
- Add gpt-5.4-pro ($30/$180), gpt-5.4-nano ($0.20/$1.25) — current models
- Add gpt-5-mini to dropdown (still available, deprecating Aug 2026)
- Remove gpt-5.2 (not a valid model ID; gpt-5.2-codex is the correct one)
- Fix pricing from official API docs (gpt-5.5: $5/$30, gpt-5.4: $2.50/$15,
  gpt-5.4-mini: $0.75/$4.50, o4-mini: $0.55/$2.20, o3: $2/$8)
- Fix context windows (5.4 series: 400k, not 1M)
- Better grouping: Latest / Older / Reasoning / Legacy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: move task to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use standard tier for gpt-5.4-nano (low-cost reserved for Workers AI)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: archive completed task

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: model routing for o3/o4-mini and test maintenance

- Fix isOpenAIModel() to match 'o3' and 'o4-*' prefixes — previously
  these were misrouted to Workers AI instead of OpenAI AI Gateway
- Update ai-proxy test to reference gpt-5.5 instead of removed gpt-5.2
- Add cross-catalog invariant test ensuring every dropdown model has
  a corresponding PLATFORM_AI_MODELS entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: compact isOpenAIModel to stay under 800-line file size limit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ltm#1085)

The glassy transparency (backdrop-filter blur) on the session header tab
was broken by commit 6e00d96 which moved the header from absolute
positioning into the normal document flow. Messages no longer scrolled
behind the header, so there was nothing to blur.

Changes:
- Remove `glass-composited` from SessionHeader and ErrorBanner (the
  `transform: translateZ(0)` created a new stacking context that
  interfered with backdrop-filter rendering)
- Make FloatingHeader absolutely positioned over the scroll content
  so messages pass behind it and the blur effect is visible
- Add spacer in Virtuoso Header to prevent messages from hiding
  behind the overlaid header

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…#1086)

The glass-composited class was intentionally removed in the prior PR
because its transform: translateZ(0) broke backdrop-filter blur.

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…aphaeltm#1087)

* fix: Codex OAuth (BYO auth.json) crashes with missing OPENAI_API_KEY

Two-sided fix for Codex users who bring their own OAuth token (auth.json):

API side: The passthrough proxy exclusion on runtime.ts:133 only checked
for Claude Code + OAuth, allowing Codex + OAuth users to receive an
inferenceConfig with provider "openai-passthrough" they shouldn't get.
Extended the condition to also exclude Codex OAuth credentials.

VM agent side: Belt-and-suspenders guard in
codexProxyProviderConfigFromCredential — when the credential kind is
"oauth-token" (auth-file injection), skip generating a proxy provider
config that would write env_key = "OPENAI_API_KEY" to config.toml.
That env var is never set in auth-file mode, causing Codex to crash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger re-run with updated PR description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger with corrected preflight section names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
* Show recovery container status in chat header

* Reduce recovery badge test duplication

* Address recovery badge quality findings

* Consolidate workspace badge test setup

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Publish SAM's 2026-05-21 daily development journal about the node readiness freshness fix.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raphaeltm raphaeltm merged commit 7cd7afd into main May 21, 2026
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants