-
Notifications
You must be signed in to change notification settings - Fork 84
Feature: Live model switching for running sessions #1090
Description
Summary
Add the ability to change the LLM model used by a running agentic session without stopping and recreating it. This preserves conversation history, workspace state, and avoids the overhead of session teardown/setup.
Currently, the model is set at session creation time via llmSettings.model and cannot be changed. Users who want a different model must stop the session and create a new one, losing all context.
API Design
New Endpoint: Patch Session
PATCH /api/projects/{projectName}/agentic-sessions/{sessionName}
A general-purpose partial update endpoint for mutable session properties. Initially supports llmSettings, but is extensible to other fields in the future.
Request Body:
{
"llmSettings": {
"model": "claude-opus-4-6"
}
}Response: 200 OK
{
"name": "my-session",
"phase": "Running",
"spec": {
"llmSettings": {
"model": "claude-opus-4-6"
}
},
"previousModel": "claude-sonnet-4-5",
"modelSwitchedAt": "2026-03-28T14:30:00Z"
}Error Cases:
| Status | Condition |
|---|---|
400 |
Invalid model name or unsupported model |
404 |
Session not found |
409 |
Session is in a terminal phase (Stopped, Failed, Completed) |
409 |
Model switch already in progress |
422 |
Session is mid-generation (actively streaming a response) |
Backend Implementation
1. Session Spec: Make llmSettings Mutable
Separate spec fields into immutable (repos, sessionName) and mutable (llmSettings) categories.
SessionSpec:
immutable:
- sessionName
- repos
mutable:
- llmSettings
- (future: resource limits, tools config, etc.)
2. Model Switch Flow
User Request (PATCH)
|
v
API Server
|-- Validate model name against allowed models list
|-- Check session phase == Running
|-- Check no active generation in progress
|
v
Update Session Record
|-- Update spec.llmSettings.model in database
|-- Write audit entry (previousModel, newModel, timestamp)
|
v
Notify Session Agent
|-- Send control message to session pod via internal channel
|-- Agent acknowledges and picks up new model for next LLM call
|
v
Return updated session to caller
3. Agent-Side Handling
Recommended: Config Polling (Simple)
- Agent reads its model config from a shared source (configmap, env, or API) before each LLM call.
- On PATCH, the backend updates the config source.
- Next time the agent makes an LLM call, it picks up the new model.
- No IPC infrastructure needed. Change takes effect on the very next LLM call, which in practice means the next user message or tool invocation.
4. Database / State Changes
Add model history tracking to the session record:
{
"modelHistory": [
{
"model": "claude-sonnet-4-5",
"from": "2026-03-28T10:00:00Z",
"to": "2026-03-28T14:30:00Z"
},
{
"model": "claude-opus-4-6",
"from": "2026-03-28T14:30:00Z",
"to": null
}
]
}This provides auditability and supports future features like per-model cost tracking.
Context Injection on Model Switch
This is the critical challenge: when the model changes, the new model has no memory of the session. The agent runtime must bridge this gap.
The Problem
Simply replaying the full raw history into the new model is problematic because:
- Context window size mismatch -- switching from a large-context model to a smaller one may mean the history doesn't fit
- Token cost -- replaying 100k+ tokens of raw history on every subsequent call is expensive
- Format differences -- tool call/result formatting, system prompt conventions, or multi-turn structure may differ subtly between model families
- Compressed history is opaque -- if the prior model's context manager already summarized early turns, those summaries may reference things in a model-specific way
Strategy: Contextual Handoff Message
On model switch, the agent runtime constructs a handoff message -- a structured summary injected as a system-level context block at the start of the new model's conversation.
Handoff Message Structure
{
"role": "system",
"content": "[Model Handoff Context]\n\nThis session was previously running on {previousModel}.\n\n## Session Goal\n{initialPrompt}\n\n## Conversation Summary\n{generatedSummary}\n\n## Current Working State\n- Active files: {recentlyReadOrEditedFiles}\n- Current task: {currentTodoState}\n- Last user request: {lastUserMessage}\n- Last assistant action: {lastAssistantSummary}\n\n## Key Decisions Made\n{extractedDecisions}\n\n[End Handoff Context]"
}How to Generate It
Option 1: Self-Summarize Before Switch (Recommended)
Before the model switch takes effect, ask the current model to produce its own handoff summary. This is the highest-quality approach because the outgoing model has full context and can distill what matters.
PATCH arrives
→ Agent calls CURRENT model: "Generate a handoff summary for a model transition"
→ Current model returns structured summary
→ Agent stores summary, swaps to new model
→ New model's first call gets: [handoff summary] + [recent conversation tail]
Option 2: Runtime-Constructed Summary (Fallback)
If self-summarization isn't possible (model unresponsive, instant switch needed), the runtime constructs a summary mechanically from available state: initial prompt, last N conversation turns, todo list state, recently modified files, and last user message.
Option 3: Hybrid (Recommended for production)
Use self-summarization as the default with a 30-second timeout, falling back to mechanical extraction.
Context Window Budgeting
Available context = New model's max context window
- System prompt tokens
- Tool definitions tokens
- Reserve for response generation (~4k tokens)
= Budget for handoff context + conversation tail
| Transition | Context Strategy |
|---|---|
| Small → Large model | Full history replay feasible; handoff summary optional but useful |
| Large → Small model | Handoff summary essential; truncate older turns |
| Same-family (e.g., Sonnet → Opus) | Simplest case; high format compatibility, full replay with handoff header |
| Cross-family | Handoff summary strongly recommended; reformat tool call history if needed |
What Gets Preserved vs. Lost
| Preserved | Potentially Lost |
|---|---|
| Session goal / initial prompt | Nuanced "tone" or style the old model had |
| Explicit decisions and plans | Implicit reasoning chains |
| File modifications (on disk) | Model's internal "mental map" of the codebase |
| Todo list state | Subtleties from compressed early history |
| Recent conversation turns | Very old turns that don't fit the new context |
API Extension for Context Control
Allow the PATCH request to optionally control context behavior:
{
"llmSettings": { "model": "claude-opus-4-6" },
"contextStrategy": "self-summarize | replay | mechanical"
}self-summarize(default): Current model generates a handoff summaryreplay: Replay raw conversation history (best for same-family switches)mechanical: Runtime-constructed summary, no extra LLM call (fastest)
UI / CLI / MCP Integration
UI:
- Model selector dropdown in session detail view
- Current model badge on session card
- System message in conversation on model change
CLI:
acp session update my-session --model claude-opus-4-6MCP Tool: acp_update_session for programmatic switching from within sessions
Edge Cases and Safety
| Scenario | Behavior |
|---|---|
| Switch while agent is mid-generation | Reject with 422. Client should wait for turn to complete. |
| Switch to the same model | No-op, return 200 with current state. |
| Switch to unavailable/invalid model | Reject with 400 and list valid models. |
| Rapid successive switches | Last write wins. Each switch recorded in history. |
| Session in terminal phase | Reject with 409. |
| Cost/quota implications | Quota checks validate against new model's limits before allowing switch. |
Implementation Phases
Phase 1 - Core (MVP)
- PATCH endpoint with
llmSettings.modelsupport - Config polling in the agent
- Model validation against allowed list
- Basic model history tracking
Phase 2 - Context Handoff
- Self-summarize context injection strategy
- Mechanical fallback with hybrid timeout
- Context window budgeting per model
- Handoff summary caching and storage in model history
Phase 3 - Observability
- System message injected into conversation on model change
- Model history exposed in session detail API
- Per-model token usage tracking
Phase 4 - UX Polish
- UI model selector dropdown
- CLI
session updatecommand - MCP
acp_update_sessiontool for programmatic switching