OpenHands · enyst · Oct 18, 2025 · Oct 18, 2025 · Oct 18, 2025 · Oct 18, 2025
diff --git a/.gitignore b/.gitignore
@@ -203,9 +203,10 @@ cache
 /workspace/
 openapi.json
 .client/
-
 # Local workspace files
 .beads/*.db
+*.db
 .worktrees/
 agent-sdk.workspace.code-workspace
-
+*.code-workspace
+scripts/worktree.sh
diff --git a/docs/llm_profiles.md b/docs/llm_profiles.md
@@ -0,0 +1,101 @@
+LLM Profiles (design)
+
+Overview
+
+This document records the design decision for "LLM profiles" (named LLM configuration files) and how they map to the existing LLM model and persistence in the SDK.
+
+Key decisions
+
+- Reuse the existing LLM Pydantic model schema. A profile file is simply the JSON dump of an LLM instance (the same shape produced by LLM.model_dump(exclude_none=True) or LLM.load_from_json).
+- Storage location: ~/.openhands/llm-profiles/<profile_name>.json. The profile_name is the filename (no extension) used to refer to the profile.
+- Do not change ConversationState or Agent serialization format for now. Profiles are a convenience for creating LLM instances and registering them in the runtime LLMRegistry.
+- Secrets: do NOT store plaintext API keys in profile files by default. Prefer storing the env var name in the LLM.api_key (via LLM.load_from_env) or keep the API key in runtime SecretsManager. The LLMRegistry.save_profile API exposes an include_secrets flag; default False.
+- LLM.usage_id semantics: keep current behavior (a small set of runtime identifiers such as 'agent', 'condenser', 'title-gen', etc.). Do not use usage_id as the profile name.
+
+LLMRegistry profile API (summary)
+
+- list_profiles() -> list[str]
+- load_profile(name: str) -> LLM
+- save_profile(name: str, llm: LLM, include_secrets: bool = False) -> str (path)
+- register_profiles(profile_ids: Iterable[str] | None = None) -> None
+
+Implementation notes
+
+- LLMRegistry is the single entry point for both in-memory registration and on-disk profile persistence. Pass ``profile_dir`` to the constructor to override the default location when embedding the SDK.
+- Use LLM.load_from_json(path) for loading and llm.model_dump(exclude_none=True) for saving.
+- Default directory: os.path.expanduser('~/.openhands/llm-profiles/')
+- When loading, do not inject secrets. The runtime should reconcile secrets via ConversationState/Agent resolve_diff_from_deserialized or via SecretsManager.
+- When saving, respect include_secrets flag; if False, ensure secret fields (api_key, aws_* keys) are omitted or masked.
+
+CLI
+
+- Use a single flag: --llm <profile_name> to select a profile for the agent LLM.
+- Also support an environment fallback: OPENHANDS_LLM_PROFILE.
+- Provide commands: `openhands llm list`, `openhands llm show <profile_name>` (redacts secrets).
+
+Migration
+
+- Migration from inline configs to profiles: provide a migration helper script to extract inline LLMs from ~/.openhands/agent_settings.json and conversation base_state.json into ~/.openhands/llm-profiles/<name>.json and update references (manual opt-in by user).
+
+## Proposed changes for agent-sdk-19 (profile references in persistence)
+
+### Goals
+- Allow agent settings and conversation snapshots to reference stored LLM profiles by name instead of embedding full JSON payloads.
+- Maintain backward compatibility with existing inline configurations.
+- Enable a migration path so that users can opt in to profiles without losing existing data.
+
+### Persistence format updates
+- **Agent settings (`~/.openhands/agent_settings.json`)**
+  - Add an optional `profile_id` (or `llm_profile`) field wherever an LLM is configured (agent, condenser, router, etc.).
+  - When `profile_id` is present, omit the inline LLM payload in favor of the reference.
+  - Continue accepting inline definitions when `profile_id` is absent.
+- **Conversation base state (`~/.openhands/conversations/<id>/base_state.json`)**
+  - Store `profile_id` for any LLM that originated from a profile when the conversation was created.
+  - Inline the full LLM payload only when no profile reference exists.
+
+### Loader behavior
+- On startup, configuration loaders must detect `profile_id` and load the corresponding LLM via `LLMRegistry.load_profile(profile_id)`.
+- If the referenced profile cannot be found, fall back to existing inline data (if available) and surface a clear warning.
+- Inject secrets after loading (same flow used today when constructing LLM instances).
+
+### Writer behavior
+- When persisting updated agent settings or conversation snapshots, write back the `profile_id` whenever the active LLM was sourced from a profile.
+- Only write the raw LLM configuration for ad-hoc instances (no associated profile), preserving current behavior.
+- Respect the `OPENHANDS_INLINE_CONVERSATIONS` flag (default: true for reproducibility). When enabled, always inline full LLM payloads—even if `profile_id` exists—and surface an error if a conversation only contains `profile_id` entries.
+
+### Migration helper
+- Provide a utility (script or CLI command) that:
+  1. Scans existing agent settings and conversation base states for inline LLM configs.
+  2. Uses `LLMRegistry.save_profile` to serialize them into `~/.openhands/llm-profiles/<generated-name>.json`.
+  3. Rewrites the source files to reference the new profiles via `profile_id`.
+- Keep the migration opt-in and idempotent so users can review changes before adopting profiles.
+
+### Testing & validation
+- Extend persistence tests to cover:
+  - Loading agent settings with `profile_id` only.
+  - Mixed scenarios (profile reference plus inline fallback).
+  - Conversation snapshots that retain profile references across reloads.
+- Add regression tests ensuring legacy inline-only configurations continue to work.
+
+### Follow-up coordination
+- Subsequent tasks (agent-sdk-20/21/22) will build on this foundation to expose CLI flags, update documentation, and improve secrets handling.
+
+
+## Persistence integration review
+
+### Conversation snapshots vs. profile-aware serialization
+- **Caller experience:** Conversations that opt into profile references should behave the same as the legacy inline flow. Callers still receive fully expanded `LLM` payloads when they work with `ConversationState` objects or remote conversation APIs. The only observable change is that persisted `base_state.json` files can shrink to `{ "profile_id": "<name>" }` instead of storing every field.
+- **Inline vs. referenced storage:** Conversation persistence previously delegated everything to Pydantic (`model_dump_json` / `model_validate`). The draft implementation added a recursive helper (`compact_llm_profiles` / `resolve_llm_profiles`) that walked arbitrary dictionaries and manually replaced or expanded embedded LLMs. This duplication diverged from the rest of the SDK, where polymorphic models rely on validators and discriminators to control serialization.
+- **Relationship to `DiscriminatedUnionMixin`:** That mixin exists so we can ship objects across process boundaries (e.g., remote conversations) without bespoke traversal code. Keeping serialization rules on the models themselves, rather than sprinkling special cases in persistence helpers, lets us benefit from the same rebuild/validation pipeline.
+
+### Remote conversation compatibility
+- The agent server still exposes fully inlined LLM payloads to remote clients. Because the manual compaction was only invoked when writing `base_state.json`, remote APIs were unaffected. We need to preserve that behaviour so remote callers do not have to resolve profiles themselves.
+- When a conversation is restored on the server (or locally), any profile references in `base_state.json` must be expanded **before** the state is materialised; otherwise, components that expect a concrete `LLM` instance (e.g., secret reconciliation, spend tracking) will break.
+
+### Recommendation
+- Move profile resolution/compaction into the `LLM` model:
+  - A `model_validator(mode="before")` can load `{ "profile_id": ... }` payloads with the `LLMRegistry`, while respecting `OPENHANDS_INLINE_CONVERSATIONS` (raise when inline mode is enforced but only a profile reference is available).
+  - A `model_serializer(mode="json")` can honour the same inline flag via `model_dump(..., context={"inline_llm_persistence": bool})`, returning either the full inline payload or a `{ "profile_id": ... }` stub. Callers that do not provide explicit context will continue to receive inline payloads by default.
+- Have `ConversationState._save_base_state` call `model_dump_json` with the appropriate context instead of the bespoke traversal helpers. This keeps persistence logic co-located with the models, reduces drift, and keeps remote conversations working without additional glue.
+- With this approach we still support inline overrides (`OPENHANDS_INLINE_CONVERSATIONS=true`), profile-backed storage, and remote access with no behavioural changes for callers.
+
diff --git a/docs/llm_runtime_switch_investigation.md b/docs/llm_runtime_switch_investigation.md
@@ -0,0 +1,68 @@
+# Runtime LLM Profile Switching – Investigation (agent-sdk-24)
+
+## Current architecture
+
+### LLMRegistry
+- Keeps an in-memory mapping `service_to_llm: dict[str, LLM]`.
+- Loads/saves JSON profiles under `~/.openhands/llm-profiles` (or a custom directory) via:
+  - `list_profiles()` / `get_profile_path()`
+  - `save_profile(profile_id, llm)` – strips secret fields unless explicitly asked not to.
+  - `load_profile(profile_id)` – rehydrates an `LLM`, ensuring the runtime instance’s `profile_id` matches the file stem via `_load_profile_with_synced_id`.
+  - `register_profiles(profile_ids=None)` – iterates `list_profiles()`, calling `load_profile` then `add` for each profile; skips invalid payloads or duplicates.
+  - `validate_profile(data)` – wraps `LLM.model_validate` to report pydantic errors as strings.
+- `add(llm)` publishes a `RegistryEvent` to the optional subscriber and records the LLM in `service_to_llm` keyed by `llm.service_id`.
+- Currently assumes a one-to-one mapping of service_id ↔ active LLM instance.
+
+### Agent & LLM ownership
+- `AgentBase.llm` is a (frozen) `LLM` Basemodel. Agents may also own other LLMs (e.g., condensers) discovered via `AgentBase.get_all_llms()`.
+- `AgentBase.resolve_diff_from_deserialized(persisted)` reconciles a persisted agent with the runtime agent:
+  - Calls `self.llm.resolve_diff_from_deserialized(persisted.llm)`; this only permits differences in fields listed in `LLM.OVERRIDE_ON_SERIALIZE` (api keys, AWS secrets, etc.). Any other field diff raises.
+  - Ensures tool names match and the rest of the agent models are identical.
+- `LLM.resolve_diff_from_deserialized(persisted)` compares `model_dump(exclude_none=True)` between runtime and persisted objects, allowing overrides only for secret fields. Any other difference triggers a `ValueError`.
+
+### Conversation persistence
+- `ConversationState._save_base_state()` -> `compact_llm_profiles(...)` when `OPENHANDS_INLINE_CONVERSATIONS` is false, replacing inline LLM dicts with `{"profile_id": id}` entries.
+- `ConversationState.create()` -> `resolve_llm_profiles(...)` prior to validation, so profile references become concrete LLM dicts loaded from `LLMRegistry`.
+- When inline mode is enabled (`OPENHANDS_INLINE_CONVERSATIONS=true`), profiles are fully embedded and *any* LLM diff is rejected by the reconciliation flow above.
+
+### Conversation bootstrapping
+- `LocalConversation.__init__()` adds all LLMS from the agent to the registry and eagerly calls `register_profiles()` (errors logged at DEBUG level). This ensures the in-memory registry is primed with persisted profiles before a conversation resumes.
+
+## Implications for runtime switching
+
+1. **Registry as switch authority**
+   - Registry already centralizes active LLM instances and profile management, so introducing a “switch-to-profile” operation belongs here. That operation will need to:
+     - Load the target profile (if not already loaded).
+     - Update `service_to_llm` (and notify subscribers) atomically.
+     - Return the new `LLM` so callers can update their Agent / Conversation state.
+
+2. **Agent/LLM reconciliation barriers**
+   - Current `resolve_diff_from_deserialized` logic rejects *any* non-secret field change. A runtime profile swap would alter at least `LLM.model`, maybe provider-specific params. We therefore need a sanctioned path that:
+     - Skips reconciliation when conversations are persisted with profile references (i.e., inline mode disabled).
+     - Refuses to switch when inline mode is required (e.g., evals with `OPENHANDS_INLINE_CONVERSATIONS=true`). Switching in inline mode would otherwise break diff validation.
+   - This aligns with the instruction to “REJECT SWITCH for eval mode,” but “JUST SWITCH” when persistence is profile-based.
+
+3. **State & metrics consistency**
+   - After a switch we must ensure:
+     - `ConversationState.agent.llm` points at the new object (and any secondary LLM references, e.g., condensers, are updated if needed).
+     - `ConversationState.stats.service_to_metrics` either resets or continues per usage_id; we must decide what data should carry over when the service swaps to a different profile.
+     - Event persistence continues to work: future saves should store the new profile ID, and reloads should retrieve the same profile in the registry.
+
+4. **Runtime API surface**
+   - Need an ergonomic call for agents/conversations to request a new profile by name (manual selection or automated policy). Potential entry points:
+     - `LLMRegistry.switch_profile(service_id, profile_id)` returning the active `LLM`.
+     - Conversation-level helper (e.g., `LocalConversation.switch_llm(profile_id)`) that coordinates registry + agent updates + persistence.
+
+5. **Observer / callback considerations**
+   - Registry already has a single `subscriber`. If multiple components need to react to switches, we might extend this to a small pub/sub mechanism. Otherwise we can keep a single callback and have the conversation install its own handler.
+
+## Open questions / risks
+- What happens to in-flight operations when the switch occurs? (For initial implementation we can require the agent to be idle.)
+- How should token metrics roll over? We likely reset or create a new entry keyed by the new profile.
+- Tool / condenser LLMs: do we switch only the primary agent LLM, or should condensers also reference profiles? (Out of scope unless required by the plan.)
+- Tests must cover: successful switch, rejected switch in inline mode, persistence after switch, registry events.
+
+## Next steps
+1. Capture the desired UX/API in the follow-up planning issue (agent-sdk-25).
+2. Decide how to bypass reconciliation safely when profile references are used.
+3. Define exact testing matrix (registry unit tests, conversation integration tests, persistence roundtrip).