feat: design doc for stateful model providers#712
feat: design doc for stateful model providers#712pgrayy wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-712/docs/user-guide/quickstart/overview/ Updated at: 2026-03-26T17:38:28.135Z |
0bb6aea to
a04a907
Compare
|
|
||
| ## Background | ||
|
|
||
| The OpenAI Responses API is hosted on AWS Bedrock's Mantle endpoint (`bedrock-mantle.{region}.api.aws`). It uses an OpenAI-compatible format and supports stateful server-side conversation management, where the server tracks context across turns so the client only sends the latest message. |
There was a problem hiding this comment.
is this basically agentcore memory?
There was a problem hiding this comment.
I'm not sure how they have it implemented but I think it is something more native to Mantle. Either way, how it works is something opaque to the user. They don't need to care about how it is setup just that it works.
There was a problem hiding this comment.
Key takeaways
- Amazon and OpenAI will co-create a Stateful Runtime Environment that allows AI agents to maintain context and access compute resources.
- Developers can build production-scale AI applications without starting from scratch each time.
- The environment represents the next generation of how frontier AI models will be used.
| model = BedrockModel( | ||
| model_id="us.anthropic.claude-sonnet-4-20250514", | ||
| api="responses", | ||
| compute_environment="agentcore", |
There was a problem hiding this comment.
Can you please expand on this functionality? Is the goal to have this run on lambda or agentcore or any compute? How does that work?
There was a problem hiding this comment.
I would probably direct you toward https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html for more details on this specific feature. Unfortunately, I haven't had a chance to experiment yet with the compute environments.
| └── responses.py # new Responses API implementation | ||
| ``` | ||
|
|
||
| `BedrockModel` becomes a facade. The `api` parameter controls dispatch: |
There was a problem hiding this comment.
Given the decisions we made in TS, why go for a unified interface?
There was a problem hiding this comment.
Ergonomics in short. And apologies but I think you missed the follow up discussion. For TS, we decided to maintain the unified interface there as well. No more namespacing based on API. I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.
There was a problem hiding this comment.
I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.
I agree as far as the configs are similar; if we start having a situation where ProviderConfig1 looks very different from ProviderConfig2 then the idea falls apart
| ├── __init__.py # exports BedrockModel, backward-compatible imports | ||
| ├── base.py # shared config, region resolution, boto session, facade logic | ||
| ├── converse.py # current Converse/ConverseStream (extracted from bedrock.py) | ||
| └── responses.py # new Responses API implementation |
There was a problem hiding this comment.
do we need a separate provider though? I mean it's the same API spec, right?
There was a problem hiding this comment.
why don't we just use responses, or extend that one?
There was a problem hiding this comment.
nvm I got my answer couple paragraphs later 😅
There was a problem hiding this comment.
It is not. And it's different endpoints as well. There is the bedrock-runtime converse API which we support today but doesn't offer the features listed above. This doc is about supporting the bedrock-mantle responses API, which is a new endpoint and spec.
There was a problem hiding this comment.
why don't we just use responses, or extend that one?
To expand a bit more on this, I thought it would be weird to only allow this new bedrock endpoint to be hit by the OpenAI model provider. Optics wise, I think it is important to also support it through the BedrockModel. Of course, we can iterate on that. We don't have to support it right away.
There was a problem hiding this comment.
but I think it as ResponsesAPI now. Like endpoint should probably be just a config with default. There are many implementations
|
|
||
| ### Messages | ||
|
|
||
| When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages. |
There was a problem hiding this comment.
When
model_statecontains a conversation ID,
This seems very api specific. Could this instead be an option on the model? ala Model.is_stateful?
There was a problem hiding this comment.
I see below you're introducing conversation_id as a top-level concept, which makes a bit more sense
There was a problem hiding this comment.
Oh yeah the actual check we can make a bit more explicit. I think something like Model.is_stateful would be better.
|
|
||
| - The server owns conversation history in stateful mode, so clearing locally avoids confusion about what the model sees and prevents unbounded memory growth | ||
| - `MessageAddedEvent` hooks still fire for each message during the invocation | ||
| - Session managers persist messages as they happen via hooks |
There was a problem hiding this comment.
Session management currently hydrates the agent on init. When using this stateful approach, would people be guided to not use session management? I think this would replace the need for it
There was a problem hiding this comment.
It wouldn't entirely replace the need for it. Customers have custom state data they session manage. Interrupt state is session managed as well. The conversation ids proposed here would be session managed as well. So there is a need for it. The one big difference is messages. You wouldn't hydrate the messages array in the Agent when resuming a stateful model session.
| agent("What was my last charge?", conversation_id="billing") | ||
|
|
||
| agent("Track my order", conversation_id="orders") |
There was a problem hiding this comment.
Are these isolated entirely? Is it safe to use two different 'users' with the same agent but with different conversation_ids?
There was a problem hiding this comment.
I don't see why we couldn't. There of course will be some more quirks to this not called out in this document. For example, if you interrupt an agent instance, we should be careful about resuming on the same conversation.
|
|
||
| ### Conversations | ||
|
|
||
| The Responses implementation maps user-defined conversation IDs to server-generated response IDs in `model_state`. Users work with their own meaningful IDs and never need to manage server-generated ones. By default, all invocations use a `"default"` conversation. Users who need multiple conversations pass their own `conversation_id` on invoke: |
There was a problem hiding this comment.
Why is conversation_id on invoke and not at the agent level?
Given that model_state is per agent, it feels odd that this isn't an agent-level
There was a problem hiding this comment.
See example below. The proposal is to support switching conversations on the same agent instance when running a stateful model.
|
|
||
| ### Messages | ||
|
|
||
| When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages. |
There was a problem hiding this comment.
does that mean conversation is fully managed by the server? what happens if I want to change my model provider? (idk how common that use case is)
or what would snapshots look like for evals? do we depend on traces explicitly?
There was a problem hiding this comment.
You would have to be careful about changing your model provider in this scenario. We can put in protections to help avoid common footguns.
Regarding snapshots for evals, I would have to consult with @afarntrog and @poshinchen on this. We can expose methods to retrieve conversation history from the server if need be.
| - If the user provides a different conversation manager, we emit a warning (not an exception) | ||
| - `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management | ||
|
|
||
| ### Session Management |
There was a problem hiding this comment.
How does session_d play compared to conversation_id? Are they assumed to be the same?
| - Request formatting and streaming event parsing are extracted into shared utilities used by both `bedrock/responses.py` and the existing `OpenAIResponsesModel` | ||
| - Provider-specific logic (auth, endpoint, client creation) stays in each provider | ||
|
|
||
| ### Model State |
There was a problem hiding this comment.
Could models also store other state? I'm thinking/wondering - do we need a session-manager in this case or could we store it all in the backend provider?
There was a problem hiding this comment.
^ example gemini thought signature. we have a basic implementation of it, should that also be model state?
| - If the user provides a different conversation manager, we emit a warning (not an exception) | ||
| - `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management | ||
|
|
||
| ### Session Management |
There was a problem hiding this comment.
I could use some more clarity on the different responsibilities between a session manager and the stateful server-side conversation management?
|
|
||
| Each agent in a swarm or graph has its own independent `model_state` and conversation ID mapping. `model_state` is reset alongside `messages` and `state` in `reset_executor_state()`, following the existing reset pattern. | ||
|
|
||
| - When `model_state` is reset (no conversation ID), the first request sends the full message history (including prefilled messages and context summaries), starting a new server-side conversation |
There was a problem hiding this comment.
but don't we clear the agent.messages? how can we reset? what data do we reset to?
|
|
||
| ### Plugin Pattern | ||
|
|
||
| Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors: |
There was a problem hiding this comment.
I think I love & hate this at the same time 😆
What about the conversation_id in this case? Would that have to be initialized with the model?
| - All conversation mappings survive agent restarts | ||
| - All session manager implementations (file, S3, DynamoDB, custom) get this automatically since `_internal_state` is already serialized | ||
|
|
||
| ### Multi-Agent |
There was a problem hiding this comment.
fyi, same considerations apply to agent as tool strands-agents/sdk-python#1932 cc @notowen333
|
|
||
| ## Questions | ||
|
|
||
| - **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release? |
There was a problem hiding this comment.
I think it's long running agent. If your agent is running for 50 minutes, you don't want to depend on single stream being open for that amount of time
|
|
||
| ### Messages | ||
|
|
||
| When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages. |
There was a problem hiding this comment.
Would it make sense to add an explicit key in model_state like api_type == responses so this casing is more explicit
|
|
||
| ### Plugin Pattern | ||
|
|
||
| Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors: |
| ## Questions | ||
|
|
||
| - **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release? | ||
| - **Mantle feature parity**: Which Converse features (guardrails, prompt caching) are NOT available through the Responses API? |
There was a problem hiding this comment.
I think this helps drive the Bedrock facade decision too
|
|
||
| ### Plugin Pattern | ||
|
|
||
| Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors: |
There was a problem hiding this comment.
the DevX is scaring me here a bit though. Would devs need to add to both model and plugins? Or do we combine behind the scenes (and assume any param can also be a plugin)
|
|
||
| ## Questions | ||
|
|
||
| - **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release? |
There was a problem hiding this comment.
I'd take it P1, because I consider streaming to be the default. If we can merge it, it's great. If not, I think streaming should be a good start
| ) | ||
| ``` | ||
|
|
||
| - The Converse path uses boto3; the Responses path uses the OpenAI Python SDK with SigV4 signing via a custom httpx transport that resolves credentials from the same boto session |
There was a problem hiding this comment.
What overlap is there between BedrockModel and a theoretical BedrockResponsesModel?
|
|
||
| **Date**: 2026-03-26 | ||
|
|
||
| ## Overview |
There was a problem hiding this comment.
one high level comment: I'd like to see more dev story / DevX. What do devs use today, and how will this model interact with them. Use cases, etc.
| # Responses API with compute environment | ||
| model = BedrockModel( | ||
| model_id="us.anthropic.claude-sonnet-4-20250514", | ||
| api="responses", |
There was a problem hiding this comment.
how do i set if I want local orchestration or cloud orchestration? Can't I use the responses API with full conversation history, as it is normally done with Strands
There was a problem hiding this comment.
Apologies, accidentally left out a config. Meant to also demonstrate a stateful=True parameter to explicitly turn on stateful mode.
| When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages. | ||
|
|
||
| ```python | ||
| agent = Agent(model=BedrockModel(api="responses")) |
There was a problem hiding this comment.
Apologies, meant to demonstrate an explicit param to turn on stateful mode:
agent = Agent(model=BedrockModel(api="responses", stateful=True))| class BedrockModel(Model, Plugin): | ||
| name = "strands:bedrock-model" |
There was a problem hiding this comment.
Apologies, meant to make this part of Model, not specifically BedrockModel. This way it applies to all implementations.
|
Also (just thought of this) I don't think this model would work with |
There was a problem hiding this comment.
So now that I have started working on the implementation, I have identified a more appropriate path forward. You'll see in the PR but wanted to give a heads up that I'll be utilizing conversation manager instead of a model plugin to handle message clearing. I am also going to stream a new event type to capture the generated previous response id. These together with users enabling state mode on Model init will create a better devx in my opinion.
A few other notes. We are clearing on after invocation now instead of before. This helps avoid removing message prefills in graph and swarm. Also, won't be passing model state into model stream. Just need to pass in the previous response id.
Proposes stateful model provider support for the Strands Python SDK, targeting the OpenAI Responses API on Amazon Bedrock (Project Mantle). Covers the BedrockModel subpackage refactor, model_state for stateless providers, per-invocation message clearing, user-defined conversation ID mapping, session persistence, and multi-agent compatibility.