Skip to content

Feature: Live model switching for running sessions #1090

@joshbranham

Description

@joshbranham

Summary

Add the ability to change the LLM model used by a running agentic session without stopping and recreating it. This preserves conversation history, workspace state, and avoids the overhead of session teardown/setup.

Currently, the model is set at session creation time via llmSettings.model and cannot be changed. Users who want a different model must stop the session and create a new one, losing all context.


API Design

New Endpoint: Patch Session

PATCH /api/projects/{projectName}/agentic-sessions/{sessionName}

A general-purpose partial update endpoint for mutable session properties. Initially supports llmSettings, but is extensible to other fields in the future.

Request Body:

{
  "llmSettings": {
    "model": "claude-opus-4-6"
  }
}

Response: 200 OK

{
  "name": "my-session",
  "phase": "Running",
  "spec": {
    "llmSettings": {
      "model": "claude-opus-4-6"
    }
  },
  "previousModel": "claude-sonnet-4-5",
  "modelSwitchedAt": "2026-03-28T14:30:00Z"
}

Error Cases:

Status Condition
400 Invalid model name or unsupported model
404 Session not found
409 Session is in a terminal phase (Stopped, Failed, Completed)
409 Model switch already in progress
422 Session is mid-generation (actively streaming a response)

Backend Implementation

1. Session Spec: Make llmSettings Mutable

Separate spec fields into immutable (repos, sessionName) and mutable (llmSettings) categories.

SessionSpec:
  immutable:
    - sessionName
    - repos
  mutable:
    - llmSettings
    - (future: resource limits, tools config, etc.)

2. Model Switch Flow

User Request (PATCH)
    |
    v
API Server
    |-- Validate model name against allowed models list
    |-- Check session phase == Running
    |-- Check no active generation in progress
    |
    v
Update Session Record
    |-- Update spec.llmSettings.model in database
    |-- Write audit entry (previousModel, newModel, timestamp)
    |
    v
Notify Session Agent
    |-- Send control message to session pod via internal channel
    |-- Agent acknowledges and picks up new model for next LLM call
    |
    v
Return updated session to caller

3. Agent-Side Handling

Recommended: Config Polling (Simple)

  • Agent reads its model config from a shared source (configmap, env, or API) before each LLM call.
  • On PATCH, the backend updates the config source.
  • Next time the agent makes an LLM call, it picks up the new model.
  • No IPC infrastructure needed. Change takes effect on the very next LLM call, which in practice means the next user message or tool invocation.

4. Database / State Changes

Add model history tracking to the session record:

{
  "modelHistory": [
    {
      "model": "claude-sonnet-4-5",
      "from": "2026-03-28T10:00:00Z",
      "to": "2026-03-28T14:30:00Z"
    },
    {
      "model": "claude-opus-4-6",
      "from": "2026-03-28T14:30:00Z",
      "to": null
    }
  ]
}

This provides auditability and supports future features like per-model cost tracking.


Context Injection on Model Switch

This is the critical challenge: when the model changes, the new model has no memory of the session. The agent runtime must bridge this gap.

The Problem

Simply replaying the full raw history into the new model is problematic because:

  1. Context window size mismatch -- switching from a large-context model to a smaller one may mean the history doesn't fit
  2. Token cost -- replaying 100k+ tokens of raw history on every subsequent call is expensive
  3. Format differences -- tool call/result formatting, system prompt conventions, or multi-turn structure may differ subtly between model families
  4. Compressed history is opaque -- if the prior model's context manager already summarized early turns, those summaries may reference things in a model-specific way

Strategy: Contextual Handoff Message

On model switch, the agent runtime constructs a handoff message -- a structured summary injected as a system-level context block at the start of the new model's conversation.

Handoff Message Structure

{
  "role": "system",
  "content": "[Model Handoff Context]\n\nThis session was previously running on {previousModel}.\n\n## Session Goal\n{initialPrompt}\n\n## Conversation Summary\n{generatedSummary}\n\n## Current Working State\n- Active files: {recentlyReadOrEditedFiles}\n- Current task: {currentTodoState}\n- Last user request: {lastUserMessage}\n- Last assistant action: {lastAssistantSummary}\n\n## Key Decisions Made\n{extractedDecisions}\n\n[End Handoff Context]"
}

How to Generate It

Option 1: Self-Summarize Before Switch (Recommended)

Before the model switch takes effect, ask the current model to produce its own handoff summary. This is the highest-quality approach because the outgoing model has full context and can distill what matters.

PATCH arrives
  → Agent calls CURRENT model: "Generate a handoff summary for a model transition"
  → Current model returns structured summary
  → Agent stores summary, swaps to new model
  → New model's first call gets: [handoff summary] + [recent conversation tail]

Option 2: Runtime-Constructed Summary (Fallback)

If self-summarization isn't possible (model unresponsive, instant switch needed), the runtime constructs a summary mechanically from available state: initial prompt, last N conversation turns, todo list state, recently modified files, and last user message.

Option 3: Hybrid (Recommended for production)

Use self-summarization as the default with a 30-second timeout, falling back to mechanical extraction.

Context Window Budgeting

Available context = New model's max context window
                  - System prompt tokens
                  - Tool definitions tokens
                  - Reserve for response generation (~4k tokens)
                  = Budget for handoff context + conversation tail
Transition Context Strategy
Small → Large model Full history replay feasible; handoff summary optional but useful
Large → Small model Handoff summary essential; truncate older turns
Same-family (e.g., Sonnet → Opus) Simplest case; high format compatibility, full replay with handoff header
Cross-family Handoff summary strongly recommended; reformat tool call history if needed

What Gets Preserved vs. Lost

Preserved Potentially Lost
Session goal / initial prompt Nuanced "tone" or style the old model had
Explicit decisions and plans Implicit reasoning chains
File modifications (on disk) Model's internal "mental map" of the codebase
Todo list state Subtleties from compressed early history
Recent conversation turns Very old turns that don't fit the new context

API Extension for Context Control

Allow the PATCH request to optionally control context behavior:

{
  "llmSettings": { "model": "claude-opus-4-6" },
  "contextStrategy": "self-summarize | replay | mechanical"
}
  • self-summarize (default): Current model generates a handoff summary
  • replay: Replay raw conversation history (best for same-family switches)
  • mechanical: Runtime-constructed summary, no extra LLM call (fastest)

UI / CLI / MCP Integration

UI:

  • Model selector dropdown in session detail view
  • Current model badge on session card
  • System message in conversation on model change

CLI:

acp session update my-session --model claude-opus-4-6

MCP Tool: acp_update_session for programmatic switching from within sessions


Edge Cases and Safety

Scenario Behavior
Switch while agent is mid-generation Reject with 422. Client should wait for turn to complete.
Switch to the same model No-op, return 200 with current state.
Switch to unavailable/invalid model Reject with 400 and list valid models.
Rapid successive switches Last write wins. Each switch recorded in history.
Session in terminal phase Reject with 409.
Cost/quota implications Quota checks validate against new model's limits before allowing switch.

Implementation Phases

Phase 1 - Core (MVP)

  • PATCH endpoint with llmSettings.model support
  • Config polling in the agent
  • Model validation against allowed list
  • Basic model history tracking

Phase 2 - Context Handoff

  • Self-summarize context injection strategy
  • Mechanical fallback with hybrid timeout
  • Context window budgeting per model
  • Handoff summary caching and storage in model history

Phase 3 - Observability

  • System message injected into conversation on model change
  • Model history exposed in session detail API
  • Per-model token usage tracking

Phase 4 - UX Polish

  • UI model selector dropdown
  • CLI session update command
  • MCP acp_update_session tool for programmatic switching

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions