diff --git a/.gitignore b/.gitignore index 5dc044af1b..6186775d80 100644 --- a/.gitignore +++ b/.gitignore @@ -203,9 +203,10 @@ cache /workspace/ openapi.json .client/ - # Local workspace files .beads/*.db +*.db .worktrees/ agent-sdk.workspace.code-workspace - +*.code-workspace +scripts/worktree.sh diff --git a/docs/llm_profiles.md b/docs/llm_profiles.md new file mode 100644 index 0000000000..0eeede6d1b --- /dev/null +++ b/docs/llm_profiles.md @@ -0,0 +1,101 @@ +LLM Profiles (design) + +Overview + +This document records the design decision for "LLM profiles" (named LLM configuration files) and how they map to the existing LLM model and persistence in the SDK. + +Key decisions + +- Reuse the existing LLM Pydantic model schema. A profile file is simply the JSON dump of an LLM instance (the same shape produced by LLM.model_dump(exclude_none=True) or LLM.load_from_json). +- Storage location: ~/.openhands/llm-profiles/.json. The profile_name is the filename (no extension) used to refer to the profile. +- Do not change ConversationState or Agent serialization format for now. Profiles are a convenience for creating LLM instances and registering them in the runtime LLMRegistry. +- Secrets: do NOT store plaintext API keys in profile files by default. Prefer storing the env var name in the LLM.api_key (via LLM.load_from_env) or keep the API key in runtime SecretsManager. The LLMRegistry.save_profile API exposes an include_secrets flag; default False. +- LLM.usage_id semantics: keep current behavior (a small set of runtime identifiers such as 'agent', 'condenser', 'title-gen', etc.). Do not use usage_id as the profile name. + +LLMRegistry profile API (summary) + +- list_profiles() -> list[str] +- load_profile(name: str) -> LLM +- save_profile(name: str, llm: LLM, include_secrets: bool = False) -> str (path) +- register_profiles(profile_ids: Iterable[str] | None = None) -> None + +Implementation notes + +- LLMRegistry is the single entry point for both in-memory registration and on-disk profile persistence. Pass ``profile_dir`` to the constructor to override the default location when embedding the SDK. +- Use LLM.load_from_json(path) for loading and llm.model_dump(exclude_none=True) for saving. +- Default directory: os.path.expanduser('~/.openhands/llm-profiles/') +- When loading, do not inject secrets. The runtime should reconcile secrets via ConversationState/Agent resolve_diff_from_deserialized or via SecretsManager. +- When saving, respect include_secrets flag; if False, ensure secret fields (api_key, aws_* keys) are omitted or masked. + +CLI + +- Use a single flag: --llm to select a profile for the agent LLM. +- Also support an environment fallback: OPENHANDS_LLM_PROFILE. +- Provide commands: `openhands llm list`, `openhands llm show ` (redacts secrets). + +Migration + +- Migration from inline configs to profiles: provide a migration helper script to extract inline LLMs from ~/.openhands/agent_settings.json and conversation base_state.json into ~/.openhands/llm-profiles/.json and update references (manual opt-in by user). + +## Proposed changes for agent-sdk-19 (profile references in persistence) + +### Goals +- Allow agent settings and conversation snapshots to reference stored LLM profiles by name instead of embedding full JSON payloads. +- Maintain backward compatibility with existing inline configurations. +- Enable a migration path so that users can opt in to profiles without losing existing data. + +### Persistence format updates +- **Agent settings (`~/.openhands/agent_settings.json`)** + - Add an optional `profile_id` (or `llm_profile`) field wherever an LLM is configured (agent, condenser, router, etc.). + - When `profile_id` is present, omit the inline LLM payload in favor of the reference. + - Continue accepting inline definitions when `profile_id` is absent. +- **Conversation base state (`~/.openhands/conversations//base_state.json`)** + - Store `profile_id` for any LLM that originated from a profile when the conversation was created. + - Inline the full LLM payload only when no profile reference exists. + +### Loader behavior +- On startup, configuration loaders must detect `profile_id` and load the corresponding LLM via `LLMRegistry.load_profile(profile_id)`. +- If the referenced profile cannot be found, fall back to existing inline data (if available) and surface a clear warning. +- Inject secrets after loading (same flow used today when constructing LLM instances). + +### Writer behavior +- When persisting updated agent settings or conversation snapshots, write back the `profile_id` whenever the active LLM was sourced from a profile. +- Only write the raw LLM configuration for ad-hoc instances (no associated profile), preserving current behavior. +- Respect the `OPENHANDS_INLINE_CONVERSATIONS` flag (default: true for reproducibility). When enabled, always inline full LLM payloads—even if `profile_id` exists—and surface an error if a conversation only contains `profile_id` entries. + +### Migration helper +- Provide a utility (script or CLI command) that: + 1. Scans existing agent settings and conversation base states for inline LLM configs. + 2. Uses `LLMRegistry.save_profile` to serialize them into `~/.openhands/llm-profiles/.json`. + 3. Rewrites the source files to reference the new profiles via `profile_id`. +- Keep the migration opt-in and idempotent so users can review changes before adopting profiles. + +### Testing & validation +- Extend persistence tests to cover: + - Loading agent settings with `profile_id` only. + - Mixed scenarios (profile reference plus inline fallback). + - Conversation snapshots that retain profile references across reloads. +- Add regression tests ensuring legacy inline-only configurations continue to work. + +### Follow-up coordination +- Subsequent tasks (agent-sdk-20/21/22) will build on this foundation to expose CLI flags, update documentation, and improve secrets handling. + + +## Persistence integration review + +### Conversation snapshots vs. profile-aware serialization +- **Caller experience:** Conversations that opt into profile references should behave the same as the legacy inline flow. Callers still receive fully expanded `LLM` payloads when they work with `ConversationState` objects or remote conversation APIs. The only observable change is that persisted `base_state.json` files can shrink to `{ "profile_id": "" }` instead of storing every field. +- **Inline vs. referenced storage:** Conversation persistence previously delegated everything to Pydantic (`model_dump_json` / `model_validate`). The draft implementation added a recursive helper (`compact_llm_profiles` / `resolve_llm_profiles`) that walked arbitrary dictionaries and manually replaced or expanded embedded LLMs. This duplication diverged from the rest of the SDK, where polymorphic models rely on validators and discriminators to control serialization. +- **Relationship to `DiscriminatedUnionMixin`:** That mixin exists so we can ship objects across process boundaries (e.g., remote conversations) without bespoke traversal code. Keeping serialization rules on the models themselves, rather than sprinkling special cases in persistence helpers, lets us benefit from the same rebuild/validation pipeline. + +### Remote conversation compatibility +- The agent server still exposes fully inlined LLM payloads to remote clients. Because the manual compaction was only invoked when writing `base_state.json`, remote APIs were unaffected. We need to preserve that behaviour so remote callers do not have to resolve profiles themselves. +- When a conversation is restored on the server (or locally), any profile references in `base_state.json` must be expanded **before** the state is materialised; otherwise, components that expect a concrete `LLM` instance (e.g., secret reconciliation, spend tracking) will break. + +### Recommendation +- Move profile resolution/compaction into the `LLM` model: + - A `model_validator(mode="before")` can load `{ "profile_id": ... }` payloads with the `LLMRegistry`, while respecting `OPENHANDS_INLINE_CONVERSATIONS` (raise when inline mode is enforced but only a profile reference is available). + - A `model_serializer(mode="json")` can honour the same inline flag via `model_dump(..., context={"inline_llm_persistence": bool})`, returning either the full inline payload or a `{ "profile_id": ... }` stub. Callers that do not provide explicit context will continue to receive inline payloads by default. +- Have `ConversationState._save_base_state` call `model_dump_json` with the appropriate context instead of the bespoke traversal helpers. This keeps persistence logic co-located with the models, reduces drift, and keeps remote conversations working without additional glue. +- With this approach we still support inline overrides (`OPENHANDS_INLINE_CONVERSATIONS=true`), profile-backed storage, and remote access with no behavioural changes for callers. + diff --git a/docs/llm_runtime_switch_investigation.md b/docs/llm_runtime_switch_investigation.md new file mode 100644 index 0000000000..0a358959e4 --- /dev/null +++ b/docs/llm_runtime_switch_investigation.md @@ -0,0 +1,68 @@ +# Runtime LLM Profile Switching – Investigation (agent-sdk-24) + +## Current architecture + +### LLMRegistry +- Keeps an in-memory mapping `service_to_llm: dict[str, LLM]`. +- Loads/saves JSON profiles under `~/.openhands/llm-profiles` (or a custom directory) via: + - `list_profiles()` / `get_profile_path()` + - `save_profile(profile_id, llm)` – strips secret fields unless explicitly asked not to. + - `load_profile(profile_id)` – rehydrates an `LLM`, ensuring the runtime instance’s `profile_id` matches the file stem via `_load_profile_with_synced_id`. + - `register_profiles(profile_ids=None)` – iterates `list_profiles()`, calling `load_profile` then `add` for each profile; skips invalid payloads or duplicates. + - `validate_profile(data)` – wraps `LLM.model_validate` to report pydantic errors as strings. +- `add(llm)` publishes a `RegistryEvent` to the optional subscriber and records the LLM in `service_to_llm` keyed by `llm.service_id`. +- Currently assumes a one-to-one mapping of service_id ↔ active LLM instance. + +### Agent & LLM ownership +- `AgentBase.llm` is a (frozen) `LLM` Basemodel. Agents may also own other LLMs (e.g., condensers) discovered via `AgentBase.get_all_llms()`. +- `AgentBase.resolve_diff_from_deserialized(persisted)` reconciles a persisted agent with the runtime agent: + - Calls `self.llm.resolve_diff_from_deserialized(persisted.llm)`; this only permits differences in fields listed in `LLM.OVERRIDE_ON_SERIALIZE` (api keys, AWS secrets, etc.). Any other field diff raises. + - Ensures tool names match and the rest of the agent models are identical. +- `LLM.resolve_diff_from_deserialized(persisted)` compares `model_dump(exclude_none=True)` between runtime and persisted objects, allowing overrides only for secret fields. Any other difference triggers a `ValueError`. + +### Conversation persistence +- `ConversationState._save_base_state()` -> `compact_llm_profiles(...)` when `OPENHANDS_INLINE_CONVERSATIONS` is false, replacing inline LLM dicts with `{"profile_id": id}` entries. +- `ConversationState.create()` -> `resolve_llm_profiles(...)` prior to validation, so profile references become concrete LLM dicts loaded from `LLMRegistry`. +- When inline mode is enabled (`OPENHANDS_INLINE_CONVERSATIONS=true`), profiles are fully embedded and *any* LLM diff is rejected by the reconciliation flow above. + +### Conversation bootstrapping +- `LocalConversation.__init__()` adds all LLMS from the agent to the registry and eagerly calls `register_profiles()` (errors logged at DEBUG level). This ensures the in-memory registry is primed with persisted profiles before a conversation resumes. + +## Implications for runtime switching + +1. **Registry as switch authority** + - Registry already centralizes active LLM instances and profile management, so introducing a “switch-to-profile” operation belongs here. That operation will need to: + - Load the target profile (if not already loaded). + - Update `service_to_llm` (and notify subscribers) atomically. + - Return the new `LLM` so callers can update their Agent / Conversation state. + +2. **Agent/LLM reconciliation barriers** + - Current `resolve_diff_from_deserialized` logic rejects *any* non-secret field change. A runtime profile swap would alter at least `LLM.model`, maybe provider-specific params. We therefore need a sanctioned path that: + - Skips reconciliation when conversations are persisted with profile references (i.e., inline mode disabled). + - Refuses to switch when inline mode is required (e.g., evals with `OPENHANDS_INLINE_CONVERSATIONS=true`). Switching in inline mode would otherwise break diff validation. + - This aligns with the instruction to “REJECT SWITCH for eval mode,” but “JUST SWITCH” when persistence is profile-based. + +3. **State & metrics consistency** + - After a switch we must ensure: + - `ConversationState.agent.llm` points at the new object (and any secondary LLM references, e.g., condensers, are updated if needed). + - `ConversationState.stats.service_to_metrics` either resets or continues per usage_id; we must decide what data should carry over when the service swaps to a different profile. + - Event persistence continues to work: future saves should store the new profile ID, and reloads should retrieve the same profile in the registry. + +4. **Runtime API surface** + - Need an ergonomic call for agents/conversations to request a new profile by name (manual selection or automated policy). Potential entry points: + - `LLMRegistry.switch_profile(service_id, profile_id)` returning the active `LLM`. + - Conversation-level helper (e.g., `LocalConversation.switch_llm(profile_id)`) that coordinates registry + agent updates + persistence. + +5. **Observer / callback considerations** + - Registry already has a single `subscriber`. If multiple components need to react to switches, we might extend this to a small pub/sub mechanism. Otherwise we can keep a single callback and have the conversation install its own handler. + +## Open questions / risks +- What happens to in-flight operations when the switch occurs? (For initial implementation we can require the agent to be idle.) +- How should token metrics roll over? We likely reset or create a new entry keyed by the new profile. +- Tool / condenser LLMs: do we switch only the primary agent LLM, or should condensers also reference profiles? (Out of scope unless required by the plan.) +- Tests must cover: successful switch, rejected switch in inline mode, persistence after switch, registry events. + +## Next steps +1. Capture the desired UX/API in the follow-up planning issue (agent-sdk-25). +2. Decide how to bypass reconciliation safely when profile references are used. +3. Define exact testing matrix (registry unit tests, conversation integration tests, persistence roundtrip). diff --git a/docs/llm_runtime_switch_plan.md b/docs/llm_runtime_switch_plan.md new file mode 100644 index 0000000000..2895be6ae7 --- /dev/null +++ b/docs/llm_runtime_switch_plan.md @@ -0,0 +1,92 @@ +# Runtime LLM Profile Switching – Implementation Plan (agent-sdk-25) + +This plan builds on the investigation captured in `docs/llm_runtime_switch_investigation.md` and outlines the work required to let callers swap an agent’s primary LLM to another persisted profile at runtime. + +## Goals + +1. Allow manual or automated selection of a persisted LLM profile while a conversation is active. +2. Keep the LLMRegistry as the single source of truth for active LLM instances and profile knowledge. +3. Respect existing persistence behaviour: + - Profile-reference mode (`OPENHANDS_INLINE_CONVERSATIONS=false`) → switching is supported. + - Inline mode (`OPENHANDS_INLINE_CONVERSATIONS=true`) → switching is rejected early with a clear error. +4. Maintain or improve unit/integration test coverage. + +## Proposed architecture changes + +### 1. Extend LLMRegistry + +Add a `switch_profile(service_id: str, profile_id: str) -> LLM` method that: +- Loads `profile_id` via existing helpers (re-using `_load_profile_with_synced_id`). +- Registers the loaded LLM (using `add` semantics) **replacing** the previous instance for `service_id`. +- Publishes a `RegistryEvent` (re-using the existing subscriber hook) so listeners can update state. +- Returns the new `LLM` instance so callers can synchronously update their agent/state. + +Implementation notes: +- If the profile is already active, short-circuit and return the existing LLM. +- Raise a descriptive error when the profile is missing or when the service ID is unknown. + +### 2. Conversation-level coordination + +Introduce a method on `ConversationState` (and corresponding `LocalConversation`) such as `switch_agent_llm(profile_id: str)` which orchestrates the swap: + +1. Check `should_inline_conversations()`; if inline mode is enabled, raise an `LLMSwitchError` instructing the caller to disable inline persistence. +2. Resolve the agent’s `service_id` (current LLM) and call `LLMRegistry.switch_profile(...)`. +3. Update `state.agent` with a new agent object whose `.llm` is the returned instance. + - To bypass the strict `resolve_diff_from_deserialized` diff, introduce an internal helper on `AgentBase`, e.g. `_with_swapped_llm(new_llm: LLM)`, that clones the agent via `model_copy(update={"llm": new_llm})`. This avoids running `resolve_diff_from_deserialized`, which would otherwise reject the change. + - Apply the same logic to any condensers or secondary LLMs once the need arises (out of scope for v1). +4. Update `ConversationState.stats` bookkeeping: + - Either reset the metrics entry for the service to zero or continue accumulating under the same key (decision: start a fresh metrics entry to avoid mixing usage between profiles). +5. Persist immediately by calling existing save hooks (`_save_base_state`) to ensure the new profile ID is captured. + +A convenience method on `LocalConversation` (e.g. `switch_llm(profile_id: str)`) will forward to the state method and surface the error if inline mode blocks the operation. + +### 3. User-facing API + +Depending on SDK ergonomics, expose the feature through: +- A direct conversation method (`conversation.switch_llm("profile-a")`). +- CLI / higher-level integration left for a follow-up once core switching works. + +### 4. Prevent mid-operation switches (initial scope) + +For the first iteration we can require `state.agent_status == AgentExecutionStatus.IDLE` before switching. If the agent is mid-run we raise an error to avoid edge cases with in-flight steps. Future work can relax this once step coordination is available. + +## Testing strategy + +1. **Registry unit tests** + - New suite in `tests/sdk/llm/test_llm_registry_profiles.py` for `switch_profile` covering: + - Successful swap (existing service → new profile). + - Swap to the same profile is a no-op. + - Unknown profile → raises. + - Unknown service → raises. + - Subscriber notifications fire. + +2. **Conversation integration tests** (extend `tests/sdk/conversation/local/test_state_serialization.py` or create new module): + - Switching succeeds in profile-reference mode, updating `ConversationState.agent.llm.profile_id` and saving the new profile ID to disk. + - Switching persists across reloads (start a conversation, switch, re-open conversation, verify new profile in state and registry). + - Switching is rejected when `OPENHANDS_INLINE_CONVERSATIONS=true`. + - Switching while the agent is not idle (if enforced) raises the appropriate error. + +3. **Metrics tests** + - Verify that stats either reset or continue according to the chosen policy by inspecting `ConversationState.stats.service_to_metrics` before/after the swap. + +4. **Serialization tests** + - Ensure `compact_llm_profiles` writes the new `profile_id` after a switch, and `resolve_llm_profiles` rehydrates it successfully on reload. + +5. **Subscriber behaviour** + - If we extend beyond a single subscriber, add tests for multiple listeners; otherwise ensure the existing single-subscriber path still works. + +## Rollout steps + +1. Implement `LLMRegistry.switch_profile` plus targeted unit tests. +2. Wire conversation-level orchestration (`ConversationState` and `LocalConversation`). +3. Expose public API and update documentation (including usage notes and inline-mode restriction). +4. Add/cherry-pick helper utilities (e.g., agent cloning) as required. +5. Update CLI or higher-level integrations if time permits (optional follow-up). +6. Ensure documentation references the new feature in the LLM profiles guide. + +## Risks & mitigations + +- **Strict diff enforcement**: bypassed via controlled cloning instead of calling `resolve_diff_from_deserialized` when profile references are in use. +- **Persistence consistency**: immediate re-save after switching guarantees the new profile ID lands on disk. +- **Concurrent use**: initial restriction to idle state avoids race conditions; revisit later if needed. +- **Backward compatibility**: existing code paths remain untouched unless `switch_llm(...)` is invoked. diff --git a/examples/01_standalone_sdk/25_llm_profiles.py b/examples/01_standalone_sdk/25_llm_profiles.py new file mode 100644 index 0000000000..8ef133a48b --- /dev/null +++ b/examples/01_standalone_sdk/25_llm_profiles.py @@ -0,0 +1,124 @@ +"""Create and use an LLM profile with :class:`LLMRegistry`. + +Run with:: + + uv run python examples/01_standalone_sdk/25_llm_profiles.py + +Profiles are stored under ``~/.openhands/llm-profiles/.json`` by default. +Set ``LLM_PROFILE_NAME`` to pick a profile and ``LLM_API_KEY`` to supply +credentials when the profile omits secrets. +""" + +import json +import os +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import Agent, Conversation +from openhands.sdk.llm.llm import LLM +from openhands.sdk.llm.llm_registry import LLMRegistry +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool + + +PROFILE_NAME = os.getenv("LLM_PROFILE_NAME", "gpt-5-mini") + + +def ensure_profile_exists(registry: LLMRegistry, name: str) -> None: + """Create a starter profile in the default directory when missing.""" + + if name in registry.list_profiles(): + return + + profile_defaults = LLM( + model="litellm_proxy/openai/gpt-5-mini", + base_url="https://llm-proxy.eval.all-hands.dev", + temperature=0.2, + max_output_tokens=4096, + usage_id="agent", + metadata={ + "profile_description": "Sample GPT-5 Mini profile created by example 25.", + }, + ) + path = registry.save_profile(name, profile_defaults) + print(f"Created profile '{name}' at {path}") + + +def load_profile(registry: LLMRegistry, name: str) -> LLM: + llm = registry.load_profile(name) + if llm.api_key is None: + api_key = os.getenv("LLM_API_KEY") + if api_key is None: + raise RuntimeError( + "Set LLM_API_KEY to authenticate, or save the profile with " + "include_secrets=True." + ) + llm = llm.model_copy(update={"api_key": SecretStr(api_key)}) + return llm + + +def main() -> None: + registry = LLMRegistry() + ensure_profile_exists(registry, PROFILE_NAME) + + llm = load_profile(registry, PROFILE_NAME) + + register_tool("BashTool", BashTool) + tools = [Tool(name="BashTool")] + agent = Agent(llm=llm, tools=tools) + + workspace_dir = Path(os.getcwd()) + summary_path = workspace_dir / "summary_readme.md" + if summary_path.exists(): + summary_path.unlink() + + persistence_root = workspace_dir / ".conversations_llm_profiles" + conversation = Conversation( + agent=agent, + workspace=str(workspace_dir), + persistence_dir=str(persistence_root), + visualize=False, + ) + + conversation.send_message( + "Read README.md in this workspace, create a concise summary in " + "summary_readme.md (overwrite it if it exists), and respond with " + "SUMMARY_READY when the file is written." + ) + conversation.run() + + if summary_path.exists(): + print(f"summary_readme.md written to {summary_path}") + else: + print("summary_readme.md not found after first run") + + conversation.send_message( + "Thanks! Delete summary_readme.md from the workspace and respond with " + "SUMMARY_REMOVED once it is gone." + ) + conversation.run() + + if summary_path.exists(): + print("summary_readme.md still present after deletion request") + else: + print("summary_readme.md removed") + + persistence_dir = conversation.state.persistence_dir + if persistence_dir is None: + raise RuntimeError("Conversation did not persist base state to disk") + + base_state_path = Path(persistence_dir) / "base_state.json" + state_payload = json.loads(base_state_path.read_text()) + llm_entry = state_payload.get("agent", {}).get("llm", {}) + profile_in_state = llm_entry.get("profile_id") + print(f"Profile recorded in base_state.json: {profile_in_state}") + if profile_in_state != PROFILE_NAME: + print( + "Warning: profile_id in base_state.json does not match the profile " + "used at runtime." + ) + + +if __name__ == "__main__": # pragma: no cover + main() diff --git a/examples/01_standalone_sdk/26_runtime_llm_switch.py b/examples/01_standalone_sdk/26_runtime_llm_switch.py new file mode 100644 index 0000000000..47a5be59a9 --- /dev/null +++ b/examples/01_standalone_sdk/26_runtime_llm_switch.py @@ -0,0 +1,151 @@ +"""Demonstrate switching LLM profiles at runtime and persisting the result.""" + +from __future__ import annotations + +import json +import os +import uuid +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, LLMRegistry, Message, TextContent +from openhands.sdk.logger import get_logger + + +logger = get_logger(__name__) + +# --------------------------------------------------------------------------- +# Prerequisites +# --------------------------------------------------------------------------- +# 1. Configure the API key for the provider you want to use +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +# 2. Disable inline conversations so profile references are stored instead +os.environ.setdefault("OPENHANDS_INLINE_CONVERSATIONS", "false") + +# 3. Profiles live under ~/.openhands/llm-profiles by default. We create two +# variants that share the same service_id so they can be swapped at runtime. +registry = LLMRegistry() +service_id = "support-agent" + +base_profile_id = "support-mini" +alt_profile_id = "support-pro" + +base_llm = LLM( + service_id=service_id, + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), + temperature=0.0, +) +alt_llm = base_llm.model_copy( + update={ + "model": "litellm_proxy/anthropic/claude-3-5-sonnet-20240620", + "temperature": 0.4, + } +) + +registry.save_profile(base_profile_id, base_llm) +registry.save_profile(alt_profile_id, alt_llm) + +logger.info("Saved profiles %s and %s", base_profile_id, alt_profile_id) + +# --------------------------------------------------------------------------- +# Start a conversation with the base profile and persist it to disk +# --------------------------------------------------------------------------- +conversation_id = uuid.uuid4() +persistence_dir = Path("./.conversations_switch_demo").resolve() +workspace_dir = Path.cwd() + +agent = Agent(llm=registry.load_profile(base_profile_id), tools=[]) +conversation = Conversation( + agent=agent, + workspace=str(workspace_dir), + persistence_dir=str(persistence_dir), + conversation_id=conversation_id, + visualize=False, +) + +conversation.send_message( + Message( + role="user", + content=[TextContent(text="What model are you using? Keep it short.")], + ) +) +conversation.run() + +print("First run finished with profile:", conversation.agent.llm.profile_id) + +# --------------------------------------------------------------------------- +# Switch to the alternate profile while the conversation is idle +# --------------------------------------------------------------------------- +conversation.switch_llm(alt_profile_id) +print("Switched runtime profile to:", conversation.agent.llm.profile_id) + +conversation.send_message( + Message( + role="user", content=[TextContent(text="Now say hello using the new profile.")] + ) +) +conversation.run() + +print("Second run finished with profile:", conversation.agent.llm.profile_id) + +# Verify the persistence artefacts mention the new profile +base_state_path = Path(conversation.state.persistence_dir or ".") / "base_state.json" +print("base_state.json saved to:", base_state_path) +state_payload = json.loads(base_state_path.read_text()) +print("Persisted profile entry:", state_payload["agent"]["llm"]) + +# --------------------------------------------------------------------------- +# Delete the in-memory conversation and reload from disk +# --------------------------------------------------------------------------- +print("\nReloading conversation from disk...") +del conversation + +reloaded_agent = Agent(llm=registry.load_profile(alt_profile_id), tools=[]) +reloaded = Conversation( + agent=reloaded_agent, + workspace=str(workspace_dir), + persistence_dir=str(persistence_dir), + conversation_id=conversation_id, + visualize=False, +) + +print("Reloaded conversation is using profile:", reloaded.state.agent.llm.profile_id) +print("Active LLM model:", reloaded.state.agent.llm.model) + +reloaded.send_message( + Message(role="user", content=[TextContent(text="Confirm you survived a reload.")]) +) +reloaded.run() + +print("Reloaded run finished with profile:", reloaded.state.agent.llm.profile_id) + +# --------------------------------------------------------------------------- +# Part 2: Inline persistence rejects runtime switching +# --------------------------------------------------------------------------- +# When OPENHANDS_INLINE_CONVERSATIONS is true the conversation persists full +# LLM payloads instead of profile references. Switching profiles would break +# the diff reconciliation step, so the SDK deliberately rejects it with a +# RuntimeError. We demonstrate that behaviour below. +os.environ["OPENHANDS_INLINE_CONVERSATIONS"] = "true" + +inline_persistence_dir = Path("./.conversations_switch_demo_inline").resolve() +inline_agent = Agent(llm=registry.load_profile(base_profile_id), tools=[]) +inline_conversation = Conversation( + agent=inline_agent, + workspace=str(workspace_dir), + persistence_dir=str(inline_persistence_dir), + conversation_id=uuid.uuid4(), + visualize=False, +) + +try: + inline_conversation.switch_llm(alt_profile_id) +except RuntimeError as exc: + print("Inline mode switch attempt rejected as expected:", exc) +else: + raise AssertionError("Inline mode should have rejected the LLM switch") diff --git a/examples/llm-profiles/gpt-5-mini.json b/examples/llm-profiles/gpt-5-mini.json new file mode 100644 index 0000000000..78113638fb --- /dev/null +++ b/examples/llm-profiles/gpt-5-mini.json @@ -0,0 +1,11 @@ +{ + "model": "litellm_proxy/openai/gpt-5-mini", + "base_url": "https://llm-proxy.eval.all-hands.dev", + "api_key": null, + "temperature": 0.2, + "max_output_tokens": 4096, + "usage_id": "agent", + "metadata": { + "profile_description": "Sample configuration for the GPT-5 Mini profile managed by the LLM registry." + } +} diff --git a/openhands-sdk/openhands/sdk/agent/base.py b/openhands-sdk/openhands/sdk/agent/base.py index 999559e5fe..f8f1cd1ed6 100644 --- a/openhands-sdk/openhands/sdk/agent/base.py +++ b/openhands-sdk/openhands/sdk/agent/base.py @@ -329,10 +329,18 @@ def resolve_diff_from_deserialized(self, persisted: "AgentBase") -> "AgentBase": ) return reconciled + def _clone_with_llm(self, llm: LLM) -> "AgentBase": + """Return a copy of this agent with ``llm`` swapped in.""" + + clone = self.model_copy(update={"llm": llm}) + clone._tools = dict(self._tools) + return clone + def model_dump_succint(self, **kwargs): """Like model_dump, but excludes None fields by default.""" if "exclude_none" not in kwargs: kwargs["exclude_none"] = True + dumped = super().model_dump(**kwargs) # remove tool schema details for brevity if "tools" in dumped and isinstance(dumped["tools"], dict): diff --git a/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py b/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py index 43c5eb06a0..3f8edd46e8 100644 --- a/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py +++ b/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py @@ -79,6 +79,9 @@ def __init__( which agent/conversation is speaking. stuck_detection: Whether to enable stuck detection """ + # Initialize the registry early so profile references resolve during resume. + self.llm_registry = LLMRegistry() + self.agent = agent if isinstance(workspace, str): workspace = LocalWorkspace(working_dir=workspace) @@ -100,6 +103,7 @@ def __init__( else None, max_iterations=max_iteration_per_run, stuck_detection=stuck_detection, + llm_registry=self.llm_registry, ) # Default callback: persist every event to state @@ -128,11 +132,16 @@ def _default_callback(e): self.agent.init_state(self._state, on_event=self._on_event) # Register existing llms in agent - self.llm_registry = LLMRegistry() self.llm_registry.subscribe(self._state.stats.register_llm) for llm in list(self.agent.get_all_llms()): self.llm_registry.add(llm) + # Eagerly register LLM profiles from disk. + try: + self.llm_registry.register_profiles() + except Exception: + logger.debug("No LLM profiles registered") + # Initialize secrets if provided if secrets: # Convert dict[str, str] to dict[str, SecretValue] @@ -169,12 +178,25 @@ def stuck_detector(self) -> StuckDetector | None: """Get the stuck detector instance if enabled.""" return self._stuck_detector + def switch_llm(self, profile_id: str) -> None: + """Switch the active agent LLM to ``profile_id`` at runtime.""" + + with self._state: + self._state.switch_agent_llm(profile_id, registry=self.llm_registry) + self.agent = self._state.agent + logger.info( + "Switched conversation %s to profile %s", + self._state.id, + profile_id, + ) + @observe(name="conversation.send_message") def send_message(self, message: str | Message) -> None: """Send a message to the agent. Args: message: Either a string (which will be converted to a user message) + or a Message object """ # Convert string to Message if needed diff --git a/openhands-sdk/openhands/sdk/conversation/state.py b/openhands-sdk/openhands/sdk/conversation/state.py index ddfdd0a793..d68167fb05 100644 --- a/openhands-sdk/openhands/sdk/conversation/state.py +++ b/openhands-sdk/openhands/sdk/conversation/state.py @@ -2,7 +2,7 @@ import json from collections.abc import Sequence from enum import Enum -from typing import Any, Self +from typing import TYPE_CHECKING, Any, Self from pydantic import AliasChoices, Field, PrivateAttr @@ -17,6 +17,16 @@ from openhands.sdk.event.base import Event from openhands.sdk.io import FileStore, InMemoryFileStore, LocalFileStore from openhands.sdk.logger import get_logger + + +if TYPE_CHECKING: + from openhands.sdk.llm.llm_registry import LLMRegistry + + +from openhands.sdk.persistence.settings import ( + INLINE_CONTEXT_KEY, + should_inline_conversations, +) from openhands.sdk.security.confirmation_policy import ( ConfirmationPolicyBase, NeverConfirm, @@ -133,7 +143,13 @@ def _save_base_state(self, fs: FileStore) -> None: """ Persist base state snapshot (no events; events are file-backed). """ - payload = self.model_dump_json(exclude_none=True) + inline_mode = should_inline_conversations() + # Pass the inline preference down so LLM serialization knows whether to + # inline credentials or persist a profile reference. + payload = self.model_dump_json( + exclude_none=True, + context={INLINE_CONTEXT_KEY: inline_mode}, + ) fs.write(BASE_STATE, payload) # ===== Factory: open-or-create (no load/save methods needed) ===== @@ -146,11 +162,16 @@ def create( persistence_dir: str | None = None, max_iterations: int = 500, stuck_detection: bool = True, + llm_registry: "LLMRegistry | None" = None, ) -> "ConversationState": """ If base_state.json exists: resume (attach EventLog, reconcile agent, enforce id). Else: create fresh (agent required), persist base, and return. + + Args: + llm_registry: Optional registry used to expand profile references when + conversations persist profile IDs instead of inline credentials. """ file_store = ( LocalFileStore(persistence_dir) if persistence_dir else InMemoryFileStore() @@ -161,9 +182,29 @@ def create( except FileNotFoundError: base_text = None + inline_mode = should_inline_conversations() + # Keep validation and serialization in sync when loading previously + # persisted state. + context = {INLINE_CONTEXT_KEY: inline_mode} + # ---- Resume path ---- if base_text: - state = cls.model_validate(json.loads(base_text)) + base_payload = json.loads(base_text) + if inline_mode: + if _contains_profile_reference(base_payload): + raise ValueError( + "Persisted base state contains LLM profile references but " + "OPENHANDS_INLINE_CONVERSATIONS is enabled." + ) + else: + registry = llm_registry + if registry is None: + from openhands.sdk.llm.llm_registry import LLMRegistry + + registry = LLMRegistry() + _expand_profile_references(base_payload, registry) + + state = cls.model_validate(base_payload, context=context) # Enforce conversation id match if state.id != id: @@ -260,6 +301,32 @@ def __setattr__(self, name, value): f"State change callback failed for field {name}", exc_info=True ) + def switch_agent_llm(self, profile_id: str, *, registry: "LLMRegistry") -> None: + """Swap the agent's primary LLM to ``profile_id`` using ``registry``.""" + + if should_inline_conversations(): + raise RuntimeError( + "LLM switching requires OPENHANDS_INLINE_CONVERSATIONS to be false." + ) + + if self.execution_status not in ( + ConversationExecutionStatus.IDLE, + ConversationExecutionStatus.FINISHED, + ): + raise RuntimeError("Agent must be idle before switching LLM profiles.") + + usage_id = self.agent.llm.usage_id + try: + new_llm = registry.switch_profile(usage_id, profile_id) + except FileNotFoundError as exc: + raise ValueError(str(exc)) from exc + except KeyError as exc: + raise ValueError(str(exc)) from exc + + self.agent = self.agent._clone_with_llm(new_llm) + if self.execution_status == ConversationExecutionStatus.FINISHED: + self.execution_status = ConversationExecutionStatus.IDLE + @staticmethod def get_unmatched_actions(events: Sequence[Event]) -> list[ActionEvent]: """Find actions in the event history that don't have matching observations. @@ -334,3 +401,36 @@ def owned(self) -> bool: Return True if the lock is currently held by the calling thread. """ return self._lock.owned() + + +def _contains_profile_reference(node: Any) -> bool: + """Return True if ``node`` contains an LLM profile reference payload.""" + + if isinstance(node, dict): + if "profile_id" in node and "model" not in node: + return True + return any(_contains_profile_reference(value) for value in node.values()) + + if isinstance(node, list): + return any(_contains_profile_reference(item) for item in node) + + return False + + +def _expand_profile_references(node: Any, registry: "LLMRegistry") -> None: + """Inline LLM payloads for any profile references contained in ``node``.""" + + if isinstance(node, dict): + if "profile_id" in node and "model" not in node: + profile_id = node["profile_id"] + llm = registry.load_profile(profile_id) + expanded = llm.model_dump(exclude_none=True) + expanded["profile_id"] = profile_id + node.clear() + node.update(expanded) + return + for value in node.values(): + _expand_profile_references(value, registry) + elif isinstance(node, list): + for item in node: + _expand_profile_references(item, registry) diff --git a/openhands-sdk/openhands/sdk/llm/llm.py b/openhands-sdk/openhands/sdk/llm/llm.py index ff6afd4299..a77fd91cd6 100644 --- a/openhands-sdk/openhands/sdk/llm/llm.py +++ b/openhands-sdk/openhands/sdk/llm/llm.py @@ -4,7 +4,7 @@ import json import os import warnings -from collections.abc import Callable, Sequence +from collections.abc import Callable, Mapping, Sequence from contextlib import contextmanager from typing import TYPE_CHECKING, Any, ClassVar, Literal, get_args, get_origin @@ -16,8 +16,12 @@ Field, PrivateAttr, SecretStr, + SerializationInfo, + SerializerFunctionWrapHandler, + ValidationInfo, field_serializer, field_validator, + model_serializer, model_validator, ) from pydantic.json_schema import SkipJsonSchema @@ -77,6 +81,10 @@ from openhands.sdk.llm.utils.retry_mixin import RetryMixin from openhands.sdk.llm.utils.telemetry import Telemetry from openhands.sdk.logger import ENV_LOG_DIR, get_logger +from openhands.sdk.persistence.settings import ( + INLINE_CONTEXT_KEY, + should_inline_conversations, +) logger = get_logger(__name__) @@ -243,6 +251,10 @@ class LLM(BaseModel, RetryMixin, NonNativeToolCallingMixin): "Safety settings for models that support them (like Mistral AI and Gemini)" ), ) + profile_id: str | None = Field( + default=None, + description="Optional profile id (filename under ~/.openhands/llm-profiles).", + ) usage_id: str = Field( default="default", validation_alias=AliasChoices("usage_id", "service_id"), @@ -291,6 +303,31 @@ class LLM(BaseModel, RetryMixin, NonNativeToolCallingMixin): extra="forbid", arbitrary_types_allowed=True ) + @model_serializer(mode="wrap", when_used="json") + def _serialize_with_profiles( + self, handler: SerializerFunctionWrapHandler, info: SerializationInfo + ) -> Mapping[str, Any]: + """Scope LLM serialization to either inline payloads or profile refs. + + We default to inlining the full LLM payload, but when the persistence + layer explicitly opts out (by passing ``inline_llm_persistence=False`` in + ``context``) we strip the payload down to just ``{"profile_id": ...}`` so + the conversation state can round-trip a profile reference without + exposing secrets. + """ + + inline_pref = None + if info.context is not None and INLINE_CONTEXT_KEY in info.context: + inline_pref = info.context[INLINE_CONTEXT_KEY] + if inline_pref is None: + inline_pref = True + + data = handler(self) + profile_id = data.get("profile_id") if isinstance(data, dict) else None + if not inline_pref and profile_id: + return {"profile_id": profile_id} + return data + # ========================================================================= # Validators # ========================================================================= @@ -301,11 +338,38 @@ def _validate_secrets(cls, v: str | SecretStr | None, info) -> SecretStr | None: @model_validator(mode="before") @classmethod - def _coerce_inputs(cls, data): - if not isinstance(data, dict): + def _coerce_inputs(cls, data: Any, info: ValidationInfo): + if not isinstance(data, Mapping): return data d = dict(data) + profile_id = d.get("profile_id") + if profile_id and "model" not in d: + inline_pref = None + if info.context is not None and INLINE_CONTEXT_KEY in info.context: + inline_pref = info.context[INLINE_CONTEXT_KEY] + if inline_pref is None: + inline_pref = should_inline_conversations() + + if inline_pref: + raise ValueError( + "Encountered profile reference for LLM while " + "OPENHANDS_INLINE_CONVERSATIONS is enabled. " + "Inline the profile or set " + "OPENHANDS_INLINE_CONVERSATIONS=false." + ) + + if info.context is None or "llm_registry" not in info.context: + raise ValueError( + "LLM registry required in context to load profile references." + ) + + registry = info.context["llm_registry"] + llm = registry.load_profile(profile_id) + expanded = llm.model_dump(exclude_none=True) + expanded["profile_id"] = profile_id + d.update(expanded) + if "service_id" in d and "usage_id" not in d: warnings.warn( SERVICE_ID_DEPRECATION_MSG, diff --git a/openhands-sdk/openhands/sdk/llm/llm_registry.py b/openhands-sdk/openhands/sdk/llm/llm_registry.py index 640084fbac..16907f2c38 100644 --- a/openhands-sdk/openhands/sdk/llm/llm_registry.py +++ b/openhands-sdk/openhands/sdk/llm/llm_registry.py @@ -1,9 +1,12 @@ +import json +import re import warnings -from collections.abc import Callable -from typing import ClassVar +from collections.abc import Callable, Iterable, Mapping +from pathlib import Path +from typing import Any, ClassVar from uuid import uuid4 -from pydantic import BaseModel, ConfigDict +from pydantic import BaseModel, ConfigDict, ValidationError from openhands.sdk.llm.llm import LLM from openhands.sdk.logger import get_logger @@ -12,6 +15,15 @@ logger = get_logger(__name__) +_SECRET_FIELDS: tuple[str, ...] = ( + "api_key", + "aws_access_key_id", + "aws_secret_access_key", +) +_DEFAULT_PROFILE_DIR = Path.home() / ".openhands" / "llm-profiles" + +_PROFILE_ID_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$") + SERVICE_TO_LLM_DEPRECATION_MSG = ( "LLMRegistry.service_to_llm is deprecated and will be removed in a future " "release; use usage_to_llm instead." @@ -44,16 +56,19 @@ class LLMRegistry: def __init__( self, retry_listener: Callable[[int, int], None] | None = None, + profile_dir: str | Path | None = None, ): """Initialize the LLM registry. Args: retry_listener: Optional callback for retry events. + profile_dir: Optional directory for persisted LLM profiles. """ self.registry_id = str(uuid4()) self.retry_listener = retry_listener self._usage_to_llm: dict[str, LLM] = {} self.subscriber: Callable[[RegistryEvent], None] | None = None + self.profile_dir: Path = self._resolve_profile_dir(profile_dir) def subscribe(self, callback: Callable[[RegistryEvent], None]) -> None: """Subscribe to registry events. @@ -91,14 +106,8 @@ def service_to_llm(self) -> dict[str, LLM]: # pragma: no cover - compatibility return self._usage_to_llm def add(self, llm: LLM) -> None: - """Add an LLM instance to the registry. - - Args: - llm: The LLM instance to register. + """Add an LLM instance to the registry.""" - Raises: - ValueError: If llm.usage_id already exists in the registry. - """ usage_id = llm.usage_id if usage_id in self._usage_to_llm: message = ( @@ -114,6 +123,150 @@ def add(self, llm: LLM) -> None: f"[LLM registry {self.registry_id}]: Added LLM for usage {usage_id}" ) + def _ensure_safe_profile_id(self, profile_id: str) -> str: + if not profile_id or profile_id in {".", ".."}: + raise ValueError("Invalid profile ID.") + if Path(profile_id).name != profile_id: + raise ValueError("Profile IDs cannot contain path separators.") + if not _PROFILE_ID_PATTERN.fullmatch(profile_id): + raise ValueError( + "Profile IDs may only contain alphanumerics, '.', '_', or '-'." + ) + return profile_id + + # ------------------------------------------------------------------ + # Profile management helpers + # ------------------------------------------------------------------ + def list_profiles(self) -> list[str]: + """List all profile IDs stored on disk.""" + + return sorted(path.stem for path in self.profile_dir.glob("*.json")) + + def get_profile_path(self, profile_id: str) -> Path: + """Return the path where profile_id is stored.""" + + safe_id = self._ensure_safe_profile_id(profile_id) + return self.profile_dir / f"{safe_id}.json" + + def switch_profile(self, usage_id: str, profile_id: str) -> LLM: + """Replace ``usage_id``'s active LLM with ``profile_id`` and return it.""" + + if usage_id not in self._usage_to_llm: + raise KeyError(f"Usage ID '{usage_id}' not found in registry") + + current_llm = self._usage_to_llm[usage_id] + safe_id = self._ensure_safe_profile_id(profile_id) + if getattr(current_llm, "profile_id", None) == safe_id: + return current_llm + + llm = self.load_profile(safe_id) + llm = llm.model_copy(update={"usage_id": usage_id}) + self._usage_to_llm[usage_id] = llm + self.notify(RegistryEvent(llm=llm)) + logger.info( + f"[LLM registry {self.registry_id}]: Switched usage {usage_id} " + f"to profile {safe_id}" + ) + return llm + + def load_profile(self, profile_id: str) -> LLM: + """Load profile_id from disk and return an LLM.""" + + path = self.get_profile_path(profile_id) + if not path.exists(): + raise FileNotFoundError(f"Profile not found: {profile_id} -> {path}") + return self._load_profile_with_synced_id(path, profile_id) + + def save_profile( + self, profile_id: str, llm: LLM, include_secrets: bool = False + ) -> Path: + """Persist ``llm`` under ``profile_id``.""" + + safe_id = self._ensure_safe_profile_id(profile_id) + path = self.get_profile_path(safe_id) + data = llm.model_dump( + exclude_none=True, + context={"expose_secrets": include_secrets}, + ) + data["profile_id"] = safe_id + if not include_secrets: + for secret_field in _SECRET_FIELDS: + data.pop(secret_field, None) + + with path.open("w", encoding="utf-8") as handle: + json.dump(data, handle, indent=2, ensure_ascii=False) + logger.info(f"Saved profile {safe_id} -> {path}") + return path + + def register_profiles(self, profile_ids: Iterable[str] | None = None) -> None: + """Register profiles from disk into the in-memory registry.""" + + candidates = profile_ids if profile_ids is not None else self.list_profiles() + for profile_id in candidates: + try: + safe_id = self._ensure_safe_profile_id(profile_id) + except ValueError as exc: + logger.warning(f"Skipping profile {profile_id}: {exc}") + continue + + try: + llm = self.load_profile(safe_id) + except Exception as exc: # noqa: BLE001 + logger.warning(f"Failed to load profile {safe_id}: {exc}") + continue + + try: + self.add(llm) + except Exception as exc: # noqa: BLE001 + logger.info(f"Skipping profile {safe_id}: registry.add failed: {exc}") + + def validate_profile(self, data: Mapping[str, Any]) -> tuple[bool, list[str]]: + """Return (is_valid, errors) after validating a profile payload.""" + + try: + LLM.model_validate(dict(data)) + except ValidationError as exc: + messages: list[str] = [] + for error in exc.errors(): + loc = ".".join(str(piece) for piece in error.get("loc", ())) + if loc: + messages.append(f"{loc}: {error.get('msg')}") + else: + messages.append(error.get("msg", "Unknown validation error")) + return False, messages + return True, [] + + # ------------------------------------------------------------------ + # Internal helper methods + # ------------------------------------------------------------------ + def _resolve_profile_dir(self, profile_dir: str | Path | None) -> Path: + directory = ( + Path(profile_dir).expanduser() + if profile_dir is not None + else _DEFAULT_PROFILE_DIR + ) + directory.mkdir(parents=True, exist_ok=True) + return directory + + def _load_profile_with_synced_id(self, path: Path, profile_id: str) -> LLM: + """Load an LLM profile while keeping profile metadata aligned. + + Most callers expect the loaded LLM to reflect the profile file name so the + client apps can surface the active profile (e.g., in conversation history or CLI + prompts). We construct a *new* ``LLM`` via :meth:`model_copy` instead of + mutating the loaded instance to respect the SDK's immutability + conventions. + + We always align ``profile_id`` with the filename so callers get a precise + view of which profile is active without mutating the on-disk payload. This + mirrors previous behavior while avoiding in-place mutation. + """ + + llm = LLM.load_from_json(str(path)) + if getattr(llm, "profile_id", None) != profile_id: + return llm.model_copy(update={"profile_id": profile_id}) + return llm + def get(self, usage_id: str) -> LLM: """Get an LLM instance from the registry. diff --git a/openhands-sdk/openhands/sdk/persistence/__init__.py b/openhands-sdk/openhands/sdk/persistence/__init__.py new file mode 100644 index 0000000000..700174f618 --- /dev/null +++ b/openhands-sdk/openhands/sdk/persistence/__init__.py @@ -0,0 +1,10 @@ +"""Persistence configuration helpers.""" + +from .settings import INLINE_CONTEXT_KEY, INLINE_ENV_VAR, should_inline_conversations + + +__all__ = [ + "INLINE_CONTEXT_KEY", + "INLINE_ENV_VAR", + "should_inline_conversations", +] diff --git a/openhands-sdk/openhands/sdk/persistence/settings.py b/openhands-sdk/openhands/sdk/persistence/settings.py new file mode 100644 index 0000000000..bd8f51e490 --- /dev/null +++ b/openhands-sdk/openhands/sdk/persistence/settings.py @@ -0,0 +1,17 @@ +"""Shared helpers for SDK persistence configuration.""" + +from __future__ import annotations + +import os + + +INLINE_ENV_VAR = "OPENHANDS_INLINE_CONVERSATIONS" +INLINE_CONTEXT_KEY = "inline_llm_persistence" +_FALSE_VALUES = {"0", "false", "no"} + + +def should_inline_conversations() -> bool: + """Return True when conversations should be persisted with inline LLM payloads.""" + + value = os.getenv(INLINE_ENV_VAR, "true").strip().lower() + return value not in _FALSE_VALUES diff --git a/tests/sdk/conversation/local/test_state_serialization.py b/tests/sdk/conversation/local/test_state_serialization.py index a9304755d8..e6b499956c 100644 --- a/tests/sdk/conversation/local/test_state_serialization.py +++ b/tests/sdk/conversation/local/test_state_serialization.py @@ -17,7 +17,7 @@ ) from openhands.sdk.event.llm_convertible import MessageEvent, SystemPromptEvent from openhands.sdk.llm import LLM, Message, TextContent -from openhands.sdk.llm.llm_registry import RegistryEvent +from openhands.sdk.llm.llm_registry import LLMRegistry, RegistryEvent from openhands.sdk.security.confirmation_policy import AlwaysConfirm from openhands.sdk.workspace import LocalWorkspace @@ -128,13 +128,100 @@ def test_conversation_state_persistence_save_load(): assert isinstance(loaded_state.events[1], MessageEvent) assert loaded_state.agent.llm.model == agent.llm.model assert loaded_state.agent.__class__ == agent.__class__ - # Test model_dump equality - assert loaded_state.model_dump(mode="json") == state.model_dump(mode="json") + # Test model_dump equality ignoring any additional runtime stats + loaded_dump = loaded_state.model_dump(mode="json") + original_dump = state.model_dump(mode="json") + loaded_stats = loaded_dump.pop("stats", None) + original_stats = original_dump.pop("stats", None) + assert loaded_dump == original_dump + if original_stats is not None: + assert loaded_stats is not None + loaded_metrics = loaded_stats.get("service_to_metrics", {}) + for key, metric in original_stats.get("service_to_metrics", {}).items(): + assert key in loaded_metrics + assert loaded_metrics[key] == metric # Also verify key fields are preserved assert loaded_state.id == state.id assert len(loaded_state.events) == len(state.events) +def test_conversation_state_profile_reference_mode(tmp_path, monkeypatch): + """When inline persistence is disabled we store profile references.""" + + home_dir = tmp_path / "home" + monkeypatch.setenv("HOME", str(home_dir)) + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "false") + + registry = LLMRegistry() + llm = LLM(model="litellm_proxy/openai/gpt-5-mini", usage_id="agent") + registry.save_profile("profile-tests", llm) + + agent = Agent(llm=registry.load_profile("profile-tests"), tools=[]) + conv_id = uuid.UUID("12345678-1234-5678-9abc-1234567890ff") + persistence_root = tmp_path / "conv" + persistence_dir = LocalConversation.get_persistence_dir(persistence_root, conv_id) + + ConversationState.create( + workspace=LocalWorkspace(working_dir="/tmp"), + persistence_dir=persistence_dir, + agent=agent, + id=conv_id, + ) + + base_state = json.loads((Path(persistence_dir) / "base_state.json").read_text()) + assert base_state["agent"]["llm"] == {"profile_id": "profile-tests"} + + conversation = Conversation( + agent=agent, + persistence_dir=persistence_root, + workspace=LocalWorkspace(working_dir="/tmp"), + conversation_id=conv_id, + ) + + loaded_state = conversation.state + assert loaded_state.agent.llm.profile_id == "profile-tests" + assert loaded_state.agent.llm.model == llm.model + + +def test_conversation_state_inline_mode_errors_on_profile_reference( + tmp_path, monkeypatch +): + """Inline mode raises when encountering a persisted profile reference.""" + + home_dir = tmp_path / "home" + monkeypatch.setenv("HOME", str(home_dir)) + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "false") + + registry = LLMRegistry() + llm = LLM(model="litellm_proxy/openai/gpt-5-mini", usage_id="agent") + registry.save_profile("profile-inline", llm) + agent = Agent(llm=registry.load_profile("profile-inline"), tools=[]) + + conv_id = uuid.UUID("12345678-1234-5678-9abc-1234567890aa") + persistence_root = tmp_path / "conv" + persistence_dir = LocalConversation.get_persistence_dir(persistence_root, conv_id) + + ConversationState.create( + workspace=LocalWorkspace(working_dir="/tmp"), + persistence_dir=persistence_dir, + agent=agent, + id=conv_id, + ) + + # Switch env back to inline mode and expect a failure on reload + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "true") + + with pytest.raises(ValueError) as exc: + Conversation( + agent=agent, + persistence_dir=persistence_root, + workspace=LocalWorkspace(working_dir="/tmp"), + conversation_id=conv_id, + ) + + assert "OPENHANDS_INLINE_CONVERSATIONS" in str(exc.value) + + def test_conversation_state_incremental_save(): """Test that ConversationState saves events incrementally.""" with tempfile.TemporaryDirectory() as temp_dir: @@ -187,8 +274,18 @@ def test_conversation_state_incremental_save(): assert conversation.state.persistence_dir == persist_path_for_state loaded_state = conversation._state assert len(loaded_state.events) == 2 - # Test model_dump equality - assert loaded_state.model_dump(mode="json") == state.model_dump(mode="json") + # Test model_dump equality ignoring any additional runtime stats + loaded_dump = loaded_state.model_dump(mode="json") + original_dump = state.model_dump(mode="json") + loaded_stats = loaded_dump.pop("stats", None) + original_stats = original_dump.pop("stats", None) + assert loaded_dump == original_dump + if original_stats is not None: + assert loaded_stats is not None + loaded_metrics = loaded_stats.get("service_to_metrics", {}) + for key, metric in original_stats.get("service_to_metrics", {}).items(): + assert key in loaded_metrics + assert loaded_metrics[key] == metric def test_conversation_state_event_file_scanning(): @@ -545,3 +642,103 @@ def test_conversation_with_agent_different_llm_config(): # Test that the core state structure is preserved (excluding agent differences) new_dump = new_conversation._state.model_dump(mode="json", exclude={"agent"}) assert new_dump == original_state_dump + + +def test_local_conversation_switch_llm_persists_profile(tmp_path, monkeypatch): + home_dir = tmp_path / "home" + home_dir.mkdir() + monkeypatch.setenv("HOME", str(home_dir)) + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "false") + + registry = LLMRegistry() + base_llm = LLM(model="gpt-4o-mini", usage_id="test-llm") + registry.save_profile("base", base_llm) + alt_llm = LLM(model="gpt-4o", usage_id="alternate", temperature=0.4) + registry.save_profile("alt", alt_llm) + + agent = Agent(llm=registry.load_profile("base"), tools=[]) + workspace_dir = tmp_path / "workspace" + persistence_dir = tmp_path / "persist" + + conversation = Conversation( + agent=agent, + workspace=str(workspace_dir), + persistence_dir=str(persistence_dir), + visualize=False, + ) + assert isinstance(conversation, LocalConversation) + + conversation.switch_llm("alt") + + assert conversation.agent.llm.profile_id == "alt" + assert conversation.state.agent.llm.profile_id == "alt" + assert conversation.agent.llm.usage_id == "test-llm" + assert conversation.llm_registry.get("test-llm").model == alt_llm.model + + persistence_path = conversation.state.persistence_dir + assert persistence_path is not None + base_state_path = Path(persistence_path) / "base_state.json" + data = json.loads(base_state_path.read_text()) + assert data["agent"]["llm"] == {"profile_id": "alt"} + + reloaded_agent = Agent(llm=registry.load_profile("alt"), tools=[]) + reloaded = Conversation( + agent=reloaded_agent, + workspace=str(workspace_dir), + persistence_dir=str(persistence_dir), + conversation_id=conversation.id, + visualize=False, + ) + assert isinstance(reloaded, LocalConversation) + assert reloaded.state.agent.llm.profile_id == "alt" + + +def test_local_conversation_switch_llm_inline_mode_rejected(tmp_path, monkeypatch): + home_dir = tmp_path / "home" + home_dir.mkdir() + monkeypatch.setenv("HOME", str(home_dir)) + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "true") + + registry = LLMRegistry() + base_llm = LLM(model="gpt-4o-mini", usage_id="test-llm") + registry.save_profile("base", base_llm) + registry.save_profile("alt", LLM(model="gpt-4o", usage_id="alternate")) + + agent = Agent(llm=registry.load_profile("base"), tools=[]) + conversation = Conversation( + agent=agent, + workspace=str(tmp_path / "workspace"), + persistence_dir=str(tmp_path / "persist"), + visualize=False, + ) + assert isinstance(conversation, LocalConversation) + + with pytest.raises(RuntimeError, match="OPENHANDS_INLINE_CONVERSATIONS"): + conversation.switch_llm("alt") + + +def test_local_conversation_switch_llm_requires_idle(tmp_path, monkeypatch): + home_dir = tmp_path / "home" + home_dir.mkdir() + monkeypatch.setenv("HOME", str(home_dir)) + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "false") + + registry = LLMRegistry() + base_llm = LLM(model="gpt-4o-mini", usage_id="test-llm") + registry.save_profile("base", base_llm) + registry.save_profile("alt", LLM(model="gpt-4o", usage_id="alternate")) + + agent = Agent(llm=registry.load_profile("base"), tools=[]) + conversation = Conversation( + agent=agent, + workspace=str(tmp_path / "workspace"), + persistence_dir=str(tmp_path / "persist"), + visualize=False, + ) + assert isinstance(conversation, LocalConversation) + + with conversation.state: + conversation.state.execution_status = ConversationExecutionStatus.RUNNING + + with pytest.raises(RuntimeError, match="Agent must be idle"): + conversation.switch_llm("alt") diff --git a/tests/sdk/llm/test_llm_registry_profiles.py b/tests/sdk/llm/test_llm_registry_profiles.py new file mode 100644 index 0000000000..2c06a283c9 --- /dev/null +++ b/tests/sdk/llm/test_llm_registry_profiles.py @@ -0,0 +1,170 @@ +import json + +import pytest +from pydantic import SecretStr + +from openhands.sdk.llm.llm import LLM +from openhands.sdk.llm.llm_registry import LLMRegistry +from openhands.sdk.persistence.settings import INLINE_CONTEXT_KEY + + +def test_list_profiles_returns_sorted_names(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + (tmp_path / "b.json").write_text("{}", encoding="utf-8") + (tmp_path / "a.json").write_text("{}", encoding="utf-8") + + assert registry.list_profiles() == ["a", "b"] + + +def test_save_profile_excludes_secret_fields(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + llm = LLM( + model="gpt-4o-mini", + usage_id="service", + api_key=SecretStr("secret"), + aws_access_key_id=SecretStr("id"), + aws_secret_access_key=SecretStr("value"), + ) + + path = registry.save_profile("sample", llm) + data = json.loads(path.read_text(encoding="utf-8")) + + assert data["profile_id"] == "sample" + assert data["usage_id"] == "service" + assert "api_key" not in data + assert "aws_access_key_id" not in data + assert "aws_secret_access_key" not in data + + +def test_save_profile_can_include_secret_fields(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + llm = LLM( + model="gpt-4o-mini", + usage_id="service", + api_key=SecretStr("secret"), + aws_access_key_id=SecretStr("id"), + aws_secret_access_key=SecretStr("value"), + ) + + path = registry.save_profile("sample", llm, include_secrets=True) + data = json.loads(path.read_text(encoding="utf-8")) + + assert data["api_key"] == "secret" + assert data["aws_access_key_id"] == "id" + assert data["aws_secret_access_key"] == "value" + + +def test_load_profile_assigns_profile_id_when_missing(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + profile_path = tmp_path / "foo.json" + profile_path.write_text( + json.dumps({"model": "gpt-4o-mini", "usage_id": "svc"}), + encoding="utf-8", + ) + + llm = registry.load_profile("foo") + + assert llm.profile_id == "foo" + assert llm.usage_id == "svc" + + +def test_register_profiles_skips_invalid_and_duplicate_profiles(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + + llm = LLM(model="gpt-4o-mini", usage_id="shared") + registry.save_profile("alpha", llm) + + duplicate_data = llm.model_dump(exclude_none=True) + duplicate_data["profile_id"] = "beta" + (tmp_path / "beta.json").write_text( + json.dumps(duplicate_data), + encoding="utf-8", + ) + + (tmp_path / "gamma.json").write_text("{", encoding="utf-8") + + registry.register_profiles() + + assert registry.list_usage_ids() == ["shared"] + + +def test_llm_serializer_respects_inline_context(): + llm = LLM(model="gpt-4o-mini", usage_id="service", profile_id="sample") + + inline_payload = llm.model_dump(mode="json") + assert inline_payload["model"] == "gpt-4o-mini" + + referenced = llm.model_dump(mode="json", context={INLINE_CONTEXT_KEY: False}) + assert referenced == {"profile_id": "sample"} + + +def test_llm_validator_loads_profile_reference(tmp_path, monkeypatch): + monkeypatch.setenv("OPENHANDS_INLINE_CONVERSATIONS", "false") + registry = LLMRegistry(profile_dir=tmp_path) + source_llm = LLM(model="gpt-4o-mini", usage_id="service") + registry.save_profile("profile-tests", source_llm) + + parsed = LLM.model_validate( + {"profile_id": "profile-tests"}, + context={INLINE_CONTEXT_KEY: False, "llm_registry": registry}, + ) + + assert parsed.model == source_llm.model + assert parsed.profile_id == "profile-tests" + assert parsed.usage_id == source_llm.usage_id + + +def test_validate_profile_reports_errors(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + + ok, errors = registry.validate_profile({"model": "gpt-4o-mini", "usage_id": "svc"}) + assert ok + assert errors == [] + + ok, errors = registry.validate_profile({"usage_id": "svc"}) + assert not ok + assert any("model" in message for message in errors) + + +def test_get_profile_path_rejects_traversal(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + with pytest.raises(ValueError): + registry.get_profile_path("../secret") + + +def test_switch_profile_replaces_active_llm(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + base_llm = LLM(model="gpt-4o-mini", usage_id="primary") + registry.add(base_llm) + registry.save_profile("alternate", LLM(model="gpt-4o", usage_id="alternate")) + + events: list = [] + registry.subscribe(events.append) + + switched = registry.switch_profile("primary", "alternate") + + assert switched.profile_id == "alternate" + assert switched.usage_id == "primary" + assert registry.get("primary") is switched + assert switched.model == "gpt-4o" + assert len(events) == 1 + assert events[0].llm is switched + + # switching to the same profile should be a no-op + again = registry.switch_profile("primary", "alternate") + assert again is switched + assert len(events) == 1 + + +def test_switch_profile_unknown_usage(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + with pytest.raises(KeyError): + registry.switch_profile("missing", "profile") + + +def test_switch_profile_missing_profile(tmp_path): + registry = LLMRegistry(profile_dir=tmp_path) + registry.add(LLM(model="gpt-4o-mini", usage_id="primary")) + + with pytest.raises(FileNotFoundError): + registry.switch_profile("primary", "does-not-exist")