Feature: Live model switching for running sessions

## Summary

Add the ability to change the LLM model used by a running agentic session without stopping and recreating it. This preserves conversation history, workspace state, and avoids the overhead of session teardown/setup.

Currently, the model is set at session creation time via `llmSettings.model` and cannot be changed. Users who want a different model must stop the session and create a new one, losing all context.

---

## API Design

### New Endpoint: Patch Session

**PATCH** `/api/projects/{projectName}/agentic-sessions/{sessionName}`

A general-purpose partial update endpoint for mutable session properties. Initially supports `llmSettings`, but is extensible to other fields in the future.

**Request Body:**
```json
{
  "llmSettings": {
    "model": "claude-opus-4-6"
  }
}
```

**Response:** `200 OK`
```json
{
  "name": "my-session",
  "phase": "Running",
  "spec": {
    "llmSettings": {
      "model": "claude-opus-4-6"
    }
  },
  "previousModel": "claude-sonnet-4-5",
  "modelSwitchedAt": "2026-03-28T14:30:00Z"
}
```

**Error Cases:**

| Status | Condition |
|--------|-----------|
| `400` | Invalid model name or unsupported model |
| `404` | Session not found |
| `409` | Session is in a terminal phase (`Stopped`, `Failed`, `Completed`) |
| `409` | Model switch already in progress |
| `422` | Session is mid-generation (actively streaming a response) |

---

## Backend Implementation

### 1. Session Spec: Make `llmSettings` Mutable

Separate spec fields into immutable (repos, sessionName) and mutable (llmSettings) categories.

```
SessionSpec:
  immutable:
    - sessionName
    - repos
  mutable:
    - llmSettings
    - (future: resource limits, tools config, etc.)
```

### 2. Model Switch Flow

```
User Request (PATCH)
    |
    v
API Server
    |-- Validate model name against allowed models list
    |-- Check session phase == Running
    |-- Check no active generation in progress
    |
    v
Update Session Record
    |-- Update spec.llmSettings.model in database
    |-- Write audit entry (previousModel, newModel, timestamp)
    |
    v
Notify Session Agent
    |-- Send control message to session pod via internal channel
    |-- Agent acknowledges and picks up new model for next LLM call
    |
    v
Return updated session to caller
```

### 3. Agent-Side Handling

**Recommended: Config Polling (Simple)**
- Agent reads its model config from a shared source (configmap, env, or API) before each LLM call.
- On PATCH, the backend updates the config source.
- Next time the agent makes an LLM call, it picks up the new model.
- No IPC infrastructure needed. Change takes effect on the very next LLM call, which in practice means the next user message or tool invocation.

### 4. Database / State Changes

Add model history tracking to the session record:

```json
{
  "modelHistory": [
    {
      "model": "claude-sonnet-4-5",
      "from": "2026-03-28T10:00:00Z",
      "to": "2026-03-28T14:30:00Z"
    },
    {
      "model": "claude-opus-4-6",
      "from": "2026-03-28T14:30:00Z",
      "to": null
    }
  ]
}
```

This provides auditability and supports future features like per-model cost tracking.

---

## Context Injection on Model Switch

This is the critical challenge: when the model changes, the new model has no memory of the session. The agent runtime must bridge this gap.

### The Problem

Simply replaying the full raw history into the new model is problematic because:

1. **Context window size mismatch** -- switching from a large-context model to a smaller one may mean the history doesn't fit
2. **Token cost** -- replaying 100k+ tokens of raw history on every subsequent call is expensive
3. **Format differences** -- tool call/result formatting, system prompt conventions, or multi-turn structure may differ subtly between model families
4. **Compressed history is opaque** -- if the prior model's context manager already summarized early turns, those summaries may reference things in a model-specific way

### Strategy: Contextual Handoff Message

On model switch, the agent runtime constructs a **handoff message** -- a structured summary injected as a system-level context block at the start of the new model's conversation.

#### Handoff Message Structure

```json
{
  "role": "system",
  "content": "[Model Handoff Context]\n\nThis session was previously running on {previousModel}.\n\n## Session Goal\n{initialPrompt}\n\n## Conversation Summary\n{generatedSummary}\n\n## Current Working State\n- Active files: {recentlyReadOrEditedFiles}\n- Current task: {currentTodoState}\n- Last user request: {lastUserMessage}\n- Last assistant action: {lastAssistantSummary}\n\n## Key Decisions Made\n{extractedDecisions}\n\n[End Handoff Context]"
}
```

#### How to Generate It

**Option 1: Self-Summarize Before Switch (Recommended)**

Before the model switch takes effect, ask the *current* model to produce its own handoff summary. This is the highest-quality approach because the outgoing model has full context and can distill what matters.

```
PATCH arrives
  → Agent calls CURRENT model: "Generate a handoff summary for a model transition"
  → Current model returns structured summary
  → Agent stores summary, swaps to new model
  → New model's first call gets: [handoff summary] + [recent conversation tail]
```

**Option 2: Runtime-Constructed Summary (Fallback)**

If self-summarization isn't possible (model unresponsive, instant switch needed), the runtime constructs a summary mechanically from available state: initial prompt, last N conversation turns, todo list state, recently modified files, and last user message.

**Option 3: Hybrid (Recommended for production)**

Use self-summarization as the default with a 30-second timeout, falling back to mechanical extraction.

### Context Window Budgeting

```
Available context = New model's max context window
                  - System prompt tokens
                  - Tool definitions tokens
                  - Reserve for response generation (~4k tokens)
                  = Budget for handoff context + conversation tail
```

| Transition | Context Strategy |
|-----------|-----------------|
| Small → Large model | Full history replay feasible; handoff summary optional but useful |
| Large → Small model | Handoff summary essential; truncate older turns |
| Same-family (e.g., Sonnet → Opus) | Simplest case; high format compatibility, full replay with handoff header |
| Cross-family | Handoff summary strongly recommended; reformat tool call history if needed |

### What Gets Preserved vs. Lost

| Preserved | Potentially Lost |
|-----------|-----------------|
| Session goal / initial prompt | Nuanced "tone" or style the old model had |
| Explicit decisions and plans | Implicit reasoning chains |
| File modifications (on disk) | Model's internal "mental map" of the codebase |
| Todo list state | Subtleties from compressed early history |
| Recent conversation turns | Very old turns that don't fit the new context |

### API Extension for Context Control

Allow the PATCH request to optionally control context behavior:

```json
{
  "llmSettings": { "model": "claude-opus-4-6" },
  "contextStrategy": "self-summarize | replay | mechanical"
}
```

- `self-summarize` (default): Current model generates a handoff summary
- `replay`: Replay raw conversation history (best for same-family switches)
- `mechanical`: Runtime-constructed summary, no extra LLM call (fastest)

---

## UI / CLI / MCP Integration

**UI:**
- Model selector dropdown in session detail view
- Current model badge on session card
- System message in conversation on model change

**CLI:**
```bash
acp session update my-session --model claude-opus-4-6
```

**MCP Tool:** `acp_update_session` for programmatic switching from within sessions

---

## Edge Cases and Safety

| Scenario | Behavior |
|----------|----------|
| Switch while agent is mid-generation | Reject with `422`. Client should wait for turn to complete. |
| Switch to the same model | No-op, return `200` with current state. |
| Switch to unavailable/invalid model | Reject with `400` and list valid models. |
| Rapid successive switches | Last write wins. Each switch recorded in history. |
| Session in terminal phase | Reject with `409`. |
| Cost/quota implications | Quota checks validate against *new* model's limits before allowing switch. |

---

## Implementation Phases

### Phase 1 - Core (MVP)
- PATCH endpoint with `llmSettings.model` support
- Config polling in the agent
- Model validation against allowed list
- Basic model history tracking

### Phase 2 - Context Handoff
- Self-summarize context injection strategy
- Mechanical fallback with hybrid timeout
- Context window budgeting per model
- Handoff summary caching and storage in model history

### Phase 3 - Observability
- System message injected into conversation on model change
- Model history exposed in session detail API
- Per-model token usage tracking

### Phase 4 - UX Polish
- UI model selector dropdown
- CLI `session update` command
- MCP `acp_update_session` tool for programmatic switching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Live model switching for running sessions #1090

Summary

API Design

New Endpoint: Patch Session

Backend Implementation

1. Session Spec: Make `llmSettings` Mutable

2. Model Switch Flow

3. Agent-Side Handling

4. Database / State Changes

Context Injection on Model Switch

The Problem

Strategy: Contextual Handoff Message

Handoff Message Structure

How to Generate It

Context Window Budgeting

What Gets Preserved vs. Lost

API Extension for Context Control

UI / CLI / MCP Integration

Edge Cases and Safety

Implementation Phases

Phase 1 - Core (MVP)

Phase 2 - Context Handoff

Phase 3 - Observability

Phase 4 - UX Polish

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status	Condition
`400`	Invalid model name or unsupported model
`404`	Session not found
`409`	Session is in a terminal phase (`Stopped`, `Failed`, `Completed`)
`409`	Model switch already in progress
`422`	Session is mid-generation (actively streaming a response)

Transition	Context Strategy
Small → Large model	Full history replay feasible; handoff summary optional but useful
Large → Small model	Handoff summary essential; truncate older turns
Same-family (e.g., Sonnet → Opus)	Simplest case; high format compatibility, full replay with handoff header
Cross-family	Handoff summary strongly recommended; reformat tool call history if needed

Preserved	Potentially Lost
Session goal / initial prompt	Nuanced "tone" or style the old model had
Explicit decisions and plans	Implicit reasoning chains
File modifications (on disk)	Model's internal "mental map" of the codebase
Todo list state	Subtleties from compressed early history
Recent conversation turns	Very old turns that don't fit the new context

Scenario	Behavior
Switch while agent is mid-generation	Reject with `422`. Client should wait for turn to complete.
Switch to the same model	No-op, return `200` with current state.
Switch to unavailable/invalid model	Reject with `400` and list valid models.
Rapid successive switches	Last write wins. Each switch recorded in history.
Session in terminal phase	Reject with `409`.
Cost/quota implications	Quota checks validate against new model's limits before allowing switch.

Feature: Live model switching for running sessions #1090

Description

Summary

API Design

New Endpoint: Patch Session

Backend Implementation

1. Session Spec: Make llmSettings Mutable

2. Model Switch Flow

3. Agent-Side Handling

4. Database / State Changes

Context Injection on Model Switch

The Problem

Strategy: Contextual Handoff Message

Handoff Message Structure

How to Generate It

Context Window Budgeting

What Gets Preserved vs. Lost

API Extension for Context Control

UI / CLI / MCP Integration

Edge Cases and Safety

Implementation Phases

Phase 1 - Core (MVP)

Phase 2 - Context Handoff

Phase 3 - Observability

Phase 4 - UX Polish

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Session Spec: Make `llmSettings` Mutable