-
Notifications
You must be signed in to change notification settings - Fork 197
feat: design doc for stateful model providers #712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| # Strands: Stateful Model Providers | ||
JackYPCOnline marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **Status**: Proposed | ||
|
|
||
| **Date**: 2026-03-26 | ||
|
|
||
| ## Overview | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. one high level comment: I'd like to see more dev story / DevX. What do devs use today, and how will this model interact with them. Use cases, etc. |
||
|
|
||
| We've been asked to add stateful model provider support to the Strands Python SDK, targeting the OpenAI Responses API on Amazon Bedrock (Project Mantle). The SDK already supports the Responses API in stateless mode via `OpenAIResponsesModel`. The ask is to enable stateful server-side conversation management: the server tracks context across turns, so the SDK sends only the latest message instead of the full history each time. The Responses API on Bedrock also brings compute environment selection, server-side context compaction, and reasoning effort control. | ||
|
|
||
| ## Background | ||
|
|
||
| The OpenAI Responses API is hosted on AWS Bedrock's Mantle endpoint (`bedrock-mantle.{region}.api.aws`). It uses an OpenAI-compatible format and supports stateful server-side conversation management, where the server tracks context across turns so the client only sends the latest message. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this basically agentcore memory?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how they have it implemented but I think it is something more native to Mantle. Either way, how it works is something opaque to the user. They don't need to care about how it is setup just that it works.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Key takeaways
|
||
|
|
||
| ### Features | ||
|
|
||
| - **Stateful conversations**: Server tracks context across turns (`previous_response_id`, `conversation`) | ||
| - **Context management**: Automatic truncation (`truncation`) and server-side compaction (`context_management`) for long conversations | ||
| - **Inference controls**: `temperature`, `top_p`, `max_output_tokens` | ||
| - **Reasoning**: Effort control from none to xhigh (`reasoning.effort`) with optional summaries (`reasoning.summary`) | ||
| - **Tools**: Function tools (client-side, same as today) plus server-side built-in tools like web search, file search, and code interpreter | ||
| - **Output format**: Plain text, JSON schema enforcement, JSON mode (`text.format`), verbosity control (`text.verbosity`) | ||
| - **Execution**: Streaming (`stream`) and background/async modes (`background`), parallel tool calls (`parallel_tool_calls`, `max_tool_calls`) | ||
| - **Storage**: Response persistence (`store`) and metadata tagging (`metadata`) | ||
| - **Caching**: Prompt caching (`prompt_cache_key`, `prompt_cache_retention`) | ||
| - **Service tiers**: Default, flex, priority (`service_tier`) | ||
| - **Compute environments**: e.g., AgentCore Runtime (`compute_environment`) | ||
|
|
||
| ### Usage | ||
|
|
||
| ```python | ||
| # Turn 1: No conversation ID yet, send full input | ||
| request = { | ||
| "model": "us.anthropic.claude-sonnet-4-20250514", | ||
| "input": [{"role": "user", "content": [{"type": "input_text", "text": "Hello"}]}], | ||
| "instructions": "You are a helpful assistant.", | ||
| "stream": True | ||
| } | ||
| # Server responds with id: "resp_abc123" | ||
|
|
||
| # Turn 2: Include previous_response_id, send only latest message | ||
| request = { | ||
| "model": "us.anthropic.claude-sonnet-4-20250514", | ||
| "previous_response_id": "resp_abc123", | ||
| "input": [{"role": "user", "content": [{"type": "input_text", "text": "What did I just say?"}]}], | ||
| "instructions": "You are a helpful assistant.", | ||
| "stream": True | ||
| } | ||
| # Server rebuilds context from the chain, responds with id: "resp_def456" | ||
| ``` | ||
|
|
||
| The `previous_response_id` forms a linked list of turns. The server walks the chain to rebuild context. There is also a newer `conversation` parameter that provides a persistent container (similar to the old Assistants API threads), but `previous_response_id` is the established mechanism. | ||
|
|
||
| ## Solution | ||
|
|
||
| What follows is the full vision for stateful model support in Strands. Some of this we may reach iteratively, for example starting with stateful mode on `OpenAIResponsesModel` and adding the `BedrockModel` subpackage later. The goal is to align the team on direction so that incremental work stays on track. | ||
|
|
||
| ### Model Provider | ||
|
|
||
| `BedrockModel` is refactored from a single file (`bedrock.py`) into a subpackage: | ||
|
|
||
| ``` | ||
| strands/models/bedrock/ | ||
| ├── __init__.py # exports BedrockModel, backward-compatible imports | ||
| ├── base.py # shared config, region resolution, boto session, facade logic | ||
| ├── converse.py # current Converse/ConverseStream (extracted from bedrock.py) | ||
| └── responses.py # new Responses API implementation | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need a separate provider though? I mean it's the same API spec, right?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why don't we just use responses, or extend that one?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm I got my answer couple paragraphs later 😅
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not. And it's different endpoints as well. There is the bedrock-runtime converse API which we support today but doesn't offer the features listed above. This doc is about supporting the bedrock-mantle responses API, which is a new endpoint and spec.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
To expand a bit more on this, I thought it would be weird to only allow this new bedrock endpoint to be hit by the OpenAI model provider. Optics wise, I think it is important to also support it through the BedrockModel. Of course, we can iterate on that. We don't have to support it right away.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but I think it as ResponsesAPI now. Like endpoint should probably be just a config with default. There are many implementations |
||
| ``` | ||
|
|
||
| `BedrockModel` becomes a facade. The `api` parameter controls dispatch: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the decisions we made in TS, why go for a unified interface?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ergonomics in short. And apologies but I think you missed the follow up discussion. For TS, we decided to maintain the unified interface there as well. No more namespacing based on API. I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree as far as the configs are similar; if we start having a situation where ProviderConfig1 looks very different from ProviderConfig2 then the idea falls apart |
||
|
|
||
| ```python | ||
| # Converse API (default, current behavior, nothing changes) | ||
| model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514") | ||
|
|
||
| # Responses API (new, targets Mantle endpoint) | ||
| model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514", api="responses") | ||
|
|
||
| # Responses API with compute environment | ||
| model = BedrockModel( | ||
| model_id="us.anthropic.claude-sonnet-4-20250514", | ||
| api="responses", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do i set if I want local orchestration or cloud orchestration? Can't I use the responses API with full conversation history, as it is normally done with Strands
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies, accidentally left out a config. Meant to also demonstrate a |
||
| compute_environment="agentcore", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please expand on this functionality? Is the goal to have this run on lambda or agentcore or any compute? How does that work?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would probably direct you toward https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html for more details on this specific feature. Unfortunately, I haven't had a chance to experiment yet with the compute environments. |
||
| ) | ||
|
|
||
| # Pass-through for any Responses API parameter | ||
| model = BedrockModel( | ||
| model_id="us.anthropic.claude-sonnet-4-20250514", | ||
| api="responses", | ||
| params={"reasoning": {"effort": "high"}, "truncation": "auto"}, | ||
| ) | ||
| ``` | ||
|
|
||
| - The Converse path uses boto3; the Responses path uses the OpenAI Python SDK with SigV4 signing via a custom httpx transport that resolves credentials from the same boto session | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What overlap is there between BedrockModel and a theoretical BedrockResponsesModel? |
||
| - Bedrock API key auth is also supported as a simpler alternative | ||
| - Request formatting and streaming event parsing are extracted into shared utilities used by both `bedrock/responses.py` and the existing `OpenAIResponsesModel` | ||
| - Provider-specific logic (auth, endpoint, client creation) stays in each provider | ||
|
|
||
| ### Model State | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could models also store other state? I'm thinking/wondering - do we need a session-manager in this case or could we store it all in the backend provider?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ^ example gemini thought signature. we have a basic implementation of it, should that also be model state? |
||
|
|
||
| We introduce a new framework-managed dict called `model_state` that flows between the Agent and model provider. This keeps model providers stateless while enabling stateful conversation tracking. | ||
|
|
||
| - Owned by the Agent, not the model provider (providers remain stateless) | ||
| - Passed to `model.stream()` as a keyword argument (existing providers ignore it via `**kwargs`) | ||
| - Model reads `conversation_id` from `model_state` and writes the updated ID back after each response | ||
| - Persisted in sessions via `_internal_state` in `SessionAgent` (works with all session manager implementations) | ||
| - Accessible in hooks via `event.model_state` | ||
|
|
||
| ### Messages | ||
|
|
||
| When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This seems very api specific. Could this instead be an option on the model? ala
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see below you're introducing
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah the actual check we can make a bit more explicit. I think something like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does that mean conversation is fully managed by the server? what happens if I want to change my model provider? (idk how common that use case is) or what would snapshots look like for evals? do we depend on traces explicitly?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You would have to be careful about changing your model provider in this scenario. We can put in protections to help avoid common footguns. Regarding snapshots for evals, I would have to consult with @afarntrog and @poshinchen on this. We can expose methods to retrieve conversation history from the server if need be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to add an explicit key in model_state like |
||
|
|
||
| ```python | ||
| agent = Agent(model=BedrockModel(api="responses")) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies, meant to demonstrate an explicit param to turn on stateful mode: agent = Agent(model=BedrockModel(api="responses", stateful=True)) |
||
|
|
||
| result1 = agent("Hello") | ||
| # agent.messages has: [user: "Hello", assistant: "Hi there!"] | ||
|
|
||
| result2 = agent("What's the weather?") | ||
| # agent.messages has: [user: "What's the weather?", assistant: "Let me check..."] | ||
| # (previous invocation's messages are cleared) | ||
| # Server still has full context via previous_response_id | ||
| ``` | ||
|
|
||
| - The server owns conversation history in stateful mode, so clearing locally avoids confusion about what the model sees and prevents unbounded memory growth | ||
| - `MessageAddedEvent` hooks still fire for each message during the invocation | ||
| - Session managers persist messages as they happen via hooks | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Session management currently hydrates the agent on init. When using this stateful approach, would people be guided to not use session management? I think this would replace the need for it
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It wouldn't entirely replace the need for it. Customers have custom state data they session manage. Interrupt state is session managed as well. The conversation ids proposed here would be session managed as well. So there is a need for it. The one big difference is messages. You wouldn't hydrate the messages array in the Agent when resuming a stateful model session. |
||
| - Nothing changes within an invocation; only cross-invocation behavior differs | ||
|
|
||
| ### Conversations | ||
|
|
||
| The Responses implementation maps user-defined conversation IDs to server-generated response IDs in `model_state`. Users work with their own meaningful IDs and never need to manage server-generated ones. By default, all invocations use a `"default"` conversation. Users who need multiple conversations pass their own `conversation_id` on invoke: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is Given that model_state is per agent, it feels odd that this isn't an agent-level
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See example below. The proposal is to support switching conversations on the same agent instance when running a stateful model. |
||
|
|
||
| ```python | ||
| agent = Agent(model=BedrockModel(api="responses")) | ||
|
|
||
| # Single conversation (uses "default" implicitly) | ||
| agent("Hello") | ||
| agent("What's the capital of France?") | ||
| agent("What river runs through it?") # server knows "it" = Paris | ||
|
|
||
| # Multi-conversation with user-defined IDs | ||
| agent("Help with billing", conversation_id="billing") | ||
| agent("What was my last charge?", conversation_id="billing") | ||
|
|
||
| agent("Track my order", conversation_id="orders") | ||
|
Comment on lines
+144
to
+146
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these isolated entirely? Is it safe to use two different 'users' with the same agent but with different
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why we couldn't. There of course will be some more quirks to this not called out in this document. For example, if you interrupt an agent instance, we should be careful about resuming on the same conversation. |
||
| agent("Any updates?", conversation_id="orders") | ||
|
|
||
| # Switch back | ||
| agent("One more billing question", conversation_id="billing") | ||
| ``` | ||
|
|
||
| - `model_state` maintains the mapping (e.g., `{"default": "resp_abc", "billing": "resp_def", "orders": "resp_xyz"}`) | ||
| - Session manager persists the mapping automatically, so all conversations survive restarts | ||
| - Users never need to capture or manage server-generated IDs | ||
| - Defaults to `NullConversationManager` when the model is operating in stateful mode | ||
| - If the user provides a different conversation manager, we emit a warning (not an exception) | ||
| - `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management | ||
|
|
||
| ### Session Management | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does session_d play compared to conversation_id? Are they assumed to be the same?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could use some more clarity on the different responsibilities between a session manager and the stateful server-side conversation management? |
||
|
|
||
| `model_state` (including the full conversation ID mapping) is persisted in `_internal_state` within `SessionAgent`. On session restore, the Agent restores `model_state` and subsequent requests resume their server-side conversations. | ||
|
|
||
| ```python | ||
| # Session 1: Start conversations | ||
| session_mgr = RepositorySessionManager(session_id="user-123", ...) | ||
| agent = Agent(model=BedrockModel(api="responses"), session_manager=session_mgr) | ||
| agent("Help with my order", conversation_id="support") | ||
| agent("Check my balance", conversation_id="billing") | ||
|
|
||
| # Session 2: Resume (maybe after process restart) | ||
| session_mgr = RepositorySessionManager(session_id="user-123", ...) | ||
| agent = Agent(model=BedrockModel(api="responses"), session_manager=session_mgr) | ||
| agent("Any update on my order?", conversation_id="support") # resumes support conversation | ||
| agent("What was my last charge?", conversation_id="billing") # resumes billing conversation | ||
| ``` | ||
|
|
||
| - All conversation mappings survive agent restarts | ||
| - All session manager implementations (file, S3, DynamoDB, custom) get this automatically since `_internal_state` is already serialized | ||
|
|
||
| ### Multi-Agent | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fyi, same considerations apply to agent as tool strands-agents/sdk-python#1932 cc @notowen333 |
||
|
|
||
| Each agent in a swarm or graph has its own independent `model_state` and conversation ID mapping. `model_state` is reset alongside `messages` and `state` in `reset_executor_state()`, following the existing reset pattern. | ||
|
|
||
| - When `model_state` is reset (no conversation ID), the first request sends the full message history (including prefilled messages and context summaries), starting a new server-side conversation | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but don't we clear the |
||
| - Text-based context passing (`_build_node_input`) works unchanged in both swarm and graph | ||
| - In graph, `reset_executor_state()` only runs when `reset_on_revisit` is enabled and a node is revisited; on revisit without reset, the agent resumes its existing server-side conversation | ||
| - Parallel node execution in graph is safe since `model_state` is per-agent, not per-model | ||
|
|
||
| ### Plugin Pattern | ||
|
|
||
| Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I love & hate this at the same time 😆 What about the conversation_id in this case? Would that have to be initialized with the model?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the DevX is scaring me here a bit though. Would devs need to add to both model and plugins? Or do we combine behind the scenes (and assume any param can also be a plugin) |
||
|
|
||
| ```python | ||
| class BedrockModel(Model, Plugin): | ||
| name = "strands:bedrock-model" | ||
|
Comment on lines
+195
to
+196
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies, meant to make this part of Model, not specifically BedrockModel. This way it applies to all implementations. |
||
|
|
||
| @hook | ||
| def _on_before_invocation(self, event: BeforeInvocationEvent): | ||
| if event.agent.model_state.get("conversation_id"): | ||
| event.agent.messages.clear() | ||
| ``` | ||
|
|
||
| - Keeps the Agent generic with no stateful-mode special cases | ||
| - Any stateful provider can self-describe its behaviors through the existing hook/plugin system | ||
|
|
||
| ## Questions | ||
|
|
||
| - **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the use case here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's long running agent. If your agent is running for 50 minutes, you don't want to depend on single stream being open for that amount of time
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd take it P1, because I consider streaming to be the default. If we can merge it, it's great. If not, I think streaming should be a good start |
||
| - **Mantle feature parity**: Which Converse features (guardrails, prompt caching) are NOT available through the Responses API? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this helps drive the Bedrock facade decision too |
||
| - **Model availability**: Which models are available on the Mantle endpoint beyond OpenAI GPT OSS? | ||
| - **Conversation object**: Does Mantle support the `conversation` parameter, or only `previous_response_id`? | ||
| - **Conversation retention**: How long does the server maintain conversation state? | ||
|
|
||
| ## Resources | ||
|
|
||
| - [AWS Bedrock Mantle docs](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html) | ||
| - [AWS Bedrock supported APIs](https://docs.aws.amazon.com/bedrock/latest/userguide/apis.html) | ||
| - [AWS Bedrock API key usage](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html) | ||
| - [OpenAI Responses API reference](https://platform.openai.com/docs/api-reference/responses/create) | ||
| - [OpenAI conversation state guide](https://platform.openai.com/docs/guides/conversation-state) | ||
| - [OpenAI Responses API background mode](https://platform.openai.com/docs/guides/background) | ||
| - [Exploring Mantle CLI (blog post)](https://dev.to/aws/exploring-the-openai-compatible-apis-in-amazon-bedrock-a-cli-journey-through-project-mantle-2114) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now that I have started working on the implementation, I have identified a more appropriate path forward. You'll see in the PR but wanted to give a heads up that I'll be utilizing conversation manager instead of a model plugin to handle message clearing. I am also going to stream a new event type to capture the generated previous response id. These together with users enabling state mode on Model init will create a better devx in my opinion.
A few other notes. We are clearing on after invocation now instead of before. This helps avoid removing message prefills in graph and swarm. Also, won't be passing model state into model stream. Just need to pass in the previous response id.