feat: design doc for stateful model providers by pgrayy · Pull Request #712 · strands-agents/docs

pgrayy · 2026-03-26T17:22:22Z

Proposes stateful model provider support for the Strands Python SDK, targeting the OpenAI Responses API on Amazon Bedrock (Project Mantle). Covers the BedrockModel subpackage refactor, model_state for stateless providers, per-invocation message clearing, user-defined conversation ID mapping, session persistence, and multi-agent compatibility.

github-actions · 2026-03-26T17:27:07Z

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-712/docs/user-guide/quickstart/overview/

Updated at: 2026-03-26T17:38:28.135Z

afarntrog · 2026-03-26T19:36:59Z

designs/0004-stateful-models.md

+
+## Background
+
+The OpenAI Responses API is hosted on AWS Bedrock's Mantle endpoint (`bedrock-mantle.{region}.api.aws`). It uses an OpenAI-compatible format and supports stateful server-side conversation management, where the server tracks context across turns so the client only sends the latest message.


is this basically agentcore memory?

I'm not sure how they have it implemented but I think it is something more native to Mantle. Either way, how it works is something opaque to the user. They don't need to care about how it is setup just that it works.

Key takeaways

Amazon and OpenAI will co-create a Stateful Runtime Environment that allows AI agents to maintain context and access compute resources.

Developers can build production-scale AI applications without starting from scratch each time.

The environment represents the next generation of how frontier AI models will be used.

afarntrog · 2026-03-26T19:39:20Z

designs/0004-stateful-models.md

+model = BedrockModel(
+    model_id="us.anthropic.claude-sonnet-4-20250514",
+    api="responses",
+    compute_environment="agentcore",


Can you please expand on this functionality? Is the goal to have this run on lambda or agentcore or any compute? How does that work?

I would probably direct you toward https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html for more details on this specific feature. Unfortunately, I haven't had a chance to experiment yet with the compute environments.

zastrowm · 2026-03-26T19:40:01Z

designs/0004-stateful-models.md

+└── responses.py     # new Responses API implementation
+```
+
+`BedrockModel` becomes a facade. The `api` parameter controls dispatch:


Given the decisions we made in TS, why go for a unified interface?

Ergonomics in short. And apologies but I think you missed the follow up discussion. For TS, we decided to maintain the unified interface there as well. No more namespacing based on API. I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.

I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.

I agree as far as the configs are similar; if we start having a situation where ProviderConfig1 looks very different from ProviderConfig2 then the idea falls apart

mkmeral · 2026-03-26T19:41:11Z

designs/0004-stateful-models.md

+├── __init__.py      # exports BedrockModel, backward-compatible imports
+├── base.py          # shared config, region resolution, boto session, facade logic
+├── converse.py      # current Converse/ConverseStream (extracted from bedrock.py)
+└── responses.py     # new Responses API implementation


do we need a separate provider though? I mean it's the same API spec, right?

why don't we just use responses, or extend that one?

nvm I got my answer couple paragraphs later 😅

It is not. And it's different endpoints as well. There is the bedrock-runtime converse API which we support today but doesn't offer the features listed above. This doc is about supporting the bedrock-mantle responses API, which is a new endpoint and spec.

why don't we just use responses, or extend that one?

To expand a bit more on this, I thought it would be weird to only allow this new bedrock endpoint to be hit by the OpenAI model provider. Optics wise, I think it is important to also support it through the BedrockModel. Of course, we can iterate on that. We don't have to support it right away.

but I think it as ResponsesAPI now. Like endpoint should probably be just a config with default. There are many implementations

zastrowm · 2026-03-26T19:41:54Z

designs/0004-stateful-models.md

+
+### Messages
+
+When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.


When model_state contains a conversation ID,

This seems very api specific. Could this instead be an option on the model? ala Model.is_stateful?

I see below you're introducing conversation_id as a top-level concept, which makes a bit more sense

Oh yeah the actual check we can make a bit more explicit. I think something like Model.is_stateful would be better.

afarntrog · 2026-03-26T19:43:21Z

designs/0004-stateful-models.md

+
+- The server owns conversation history in stateful mode, so clearing locally avoids confusion about what the model sees and prevents unbounded memory growth
+- `MessageAddedEvent` hooks still fire for each message during the invocation
+- Session managers persist messages as they happen via hooks


Session management currently hydrates the agent on init. When using this stateful approach, would people be guided to not use session management? I think this would replace the need for it

It wouldn't entirely replace the need for it. Customers have custom state data they session manage. Interrupt state is session managed as well. The conversation ids proposed here would be session managed as well. So there is a need for it. The one big difference is messages. You wouldn't hydrate the messages array in the Agent when resuming a stateful model session.

afarntrog · 2026-03-26T19:44:55Z

designs/0004-stateful-models.md

+agent("What was my last charge?", conversation_id="billing")
+
+agent("Track my order", conversation_id="orders")


Are these isolated entirely? Is it safe to use two different 'users' with the same agent but with different conversation_ids?

I don't see why we couldn't. There of course will be some more quirks to this not called out in this document. For example, if you interrupt an agent instance, we should be careful about resuming on the same conversation.

zastrowm · 2026-03-26T19:45:10Z

designs/0004-stateful-models.md

+
+### Conversations
+
+The Responses implementation maps user-defined conversation IDs to server-generated response IDs in `model_state`. Users work with their own meaningful IDs and never need to manage server-generated ones. By default, all invocations use a `"default"` conversation. Users who need multiple conversations pass their own `conversation_id` on invoke:


Why is conversation_id on invoke and not at the agent level?

Given that model_state is per agent, it feels odd that this isn't an agent-level

See example below. The proposal is to support switching conversations on the same agent instance when running a stateful model.

mkmeral · 2026-03-26T19:46:04Z

designs/0004-stateful-models.md

+
+### Messages
+
+When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.


does that mean conversation is fully managed by the server? what happens if I want to change my model provider? (idk how common that use case is)

or what would snapshots look like for evals? do we depend on traces explicitly?

You would have to be careful about changing your model provider in this scenario. We can put in protections to help avoid common footguns.

Regarding snapshots for evals, I would have to consult with @afarntrog and @poshinchen on this. We can expose methods to retrieve conversation history from the server if need be.

zastrowm · 2026-03-26T19:47:06Z

designs/0004-stateful-models.md

+- If the user provides a different conversation manager, we emit a warning (not an exception)
+- `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management
+
+### Session Management


How does session_d play compared to conversation_id? Are they assumed to be the same?

zastrowm · 2026-03-26T19:47:40Z

designs/0004-stateful-models.md

+- Request formatting and streaming event parsing are extracted into shared utilities used by both `bedrock/responses.py` and the existing `OpenAIResponsesModel`
+- Provider-specific logic (auth, endpoint, client creation) stays in each provider
+
+### Model State


Could models also store other state? I'm thinking/wondering - do we need a session-manager in this case or could we store it all in the backend provider?

^ example gemini thought signature. we have a basic implementation of it, should that also be model state?

afarntrog · 2026-03-26T19:47:44Z

designs/0004-stateful-models.md

+- If the user provides a different conversation manager, we emit a warning (not an exception)
+- `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management
+
+### Session Management


I could use some more clarity on the different responsibilities between a session manager and the stateful server-side conversation management?

mkmeral · 2026-03-26T19:48:40Z

designs/0004-stateful-models.md

+
+Each agent in a swarm or graph has its own independent `model_state` and conversation ID mapping. `model_state` is reset alongside `messages` and `state` in `reset_executor_state()`, following the existing reset pattern.
+
+- When `model_state` is reset (no conversation ID), the first request sends the full message history (including prefilled messages and context summaries), starting a new server-side conversation


but don't we clear the agent.messages? how can we reset? what data do we reset to?

zastrowm · 2026-03-26T19:49:11Z

designs/0004-stateful-models.md

+
+### Plugin Pattern
+
+Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:


I think I love & hate this at the same time 😆

What about the conversation_id in this case? Would that have to be initialized with the model?

mkmeral · 2026-03-26T19:49:20Z

designs/0004-stateful-models.md

+- All conversation mappings survive agent restarts
+- All session manager implementations (file, S3, DynamoDB, custom) get this automatically since `_internal_state` is already serialized
+
+### Multi-Agent


fyi, same considerations apply to agent as tool strands-agents/sdk-python#1932 cc @notowen333

zastrowm · 2026-03-26T19:49:29Z

designs/0004-stateful-models.md

+
+## Questions
+
+- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?


What's the use case here?

I think it's long running agent. If your agent is running for 50 minutes, you don't want to depend on single stream being open for that amount of time

notowen333 · 2026-03-26T19:49:35Z

designs/0004-stateful-models.md

+
+### Messages
+
+When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.


Would it make sense to add an explicit key in model_state like api_type == responses so this casing is more explicit

afarntrog · 2026-03-26T19:49:41Z

designs/0004-stateful-models.md

+
+### Plugin Pattern
+
+Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:


I like this

zastrowm · 2026-03-26T19:49:53Z

designs/0004-stateful-models.md

+## Questions
+
+- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?
+- **Mantle feature parity**: Which Converse features (guardrails, prompt caching) are NOT available through the Responses API?


I think this helps drive the Bedrock facade decision too

mkmeral · 2026-03-26T19:50:15Z

designs/0004-stateful-models.md

+
+### Plugin Pattern
+
+Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:


the DevX is scaring me here a bit though. Would devs need to add to both model and plugins? Or do we combine behind the scenes (and assume any param can also be a plugin)

designs/0004-stateful-models.md

mkmeral · 2026-03-26T19:51:19Z

designs/0004-stateful-models.md

+
+## Questions
+
+- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?


I'd take it P1, because I consider streaming to be the default. If we can merge it, it's great. If not, I think streaming should be a good start

zastrowm · 2026-03-26T19:51:56Z

designs/0004-stateful-models.md

+)
+```
+
+- The Converse path uses boto3; the Responses path uses the OpenAI Python SDK with SigV4 signing via a custom httpx transport that resolves credentials from the same boto session


What overlap is there between BedrockModel and a theoretical BedrockResponsesModel?

mkmeral · 2026-03-26T19:58:51Z

designs/0004-stateful-models.md

+
+**Date**: 2026-03-26
+
+## Overview


one high level comment: I'd like to see more dev story / DevX. What do devs use today, and how will this model interact with them. Use cases, etc.

mkmeral · 2026-03-26T20:09:56Z

designs/0004-stateful-models.md

+# Responses API with compute environment
+model = BedrockModel(
+    model_id="us.anthropic.claude-sonnet-4-20250514",
+    api="responses",


how do i set if I want local orchestration or cloud orchestration? Can't I use the responses API with full conversation history, as it is normally done with Strands

Apologies, accidentally left out a config. Meant to also demonstrate a stateful=True parameter to explicitly turn on stateful mode.

pgrayy · 2026-03-26T20:21:31Z

designs/0004-stateful-models.md

+When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.
+
+```python
+agent = Agent(model=BedrockModel(api="responses"))


Apologies, meant to demonstrate an explicit param to turn on stateful mode:

agent = Agent(model=BedrockModel(api="responses", stateful=True))

pgrayy · 2026-03-26T20:55:31Z

designs/0004-stateful-models.md

+class BedrockModel(Model, Plugin):
+    name = "strands:bedrock-model"


Apologies, meant to make this part of Model, not specifically BedrockModel. This way it applies to all implementations.

mkmeral · 2026-03-27T20:13:28Z

Also (just thought of this) I don't think this model would work with use_agent it reuses parent agent model as default iirc. We might need to update it (either throw error, or better yet handle it similar to state resets)

pgrayy · 2026-03-27T23:06:37Z

designs/0004-stateful-models.md

So now that I have started working on the implementation, I have identified a more appropriate path forward. You'll see in the PR but wanted to give a heads up that I'll be utilizing conversation manager instead of a model plugin to handle message clearing. I am also going to stream a new event type to capture the generated previous response id. These together with users enabling state mode on Model init will create a better devx in my opinion.

A few other notes. We are clearing on after invocation now instead of before. This helps avoid removing message prefills in graph and swarm. Also, won't be passing model state into model stream. Just need to pass in the previous response id.

pgrayy temporarily deployed to auto-approve March 26, 2026 17:22 — with GitHub Actions Inactive

pgrayy requested a deployment to manual-approval March 26, 2026 17:22 — with GitHub Actions Waiting

feat: add design doc for stateful model providers

a04a907

pgrayy force-pushed the designs/stateful-models branch from 0bb6aea to a04a907 Compare March 26, 2026 17:29

pgrayy requested a deployment to manual-approval March 26, 2026 17:32 — with GitHub Actions Waiting

pgrayy temporarily deployed to auto-approve March 26, 2026 17:32 — with GitHub Actions Inactive

afarntrog reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

afarntrog reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

afarntrog reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

notowen333 reviewed Mar 26, 2026

View reviewed changes

afarntrog reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

JackYPCOnline reviewed Mar 26, 2026

View reviewed changes

designs/0004-stateful-models.md Show resolved Hide resolved

mkmeral reviewed Mar 26, 2026

View reviewed changes

zastrowm reviewed Mar 26, 2026

View reviewed changes

mkmeral reviewed Mar 26, 2026

View reviewed changes

pgrayy commented Mar 26, 2026

View reviewed changes

pgrayy commented Mar 27, 2026

View reviewed changes


		## Background

		The OpenAI Responses API is hosted on AWS Bedrock's Mantle endpoint (`bedrock-mantle.{region}.api.aws`). It uses an OpenAI-compatible format and supports stateful server-side conversation management, where the server tracks context across turns so the client only sends the latest message.


		### Messages

		When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.

		agent("What was my last charge?", conversation_id="billing")

		agent("Track my order", conversation_id="orders")


		### Conversations

		The Responses implementation maps user-defined conversation IDs to server-generated response IDs in `model_state`. Users work with their own meaningful IDs and never need to manage server-generated ones. By default, all invocations use a `"default"` conversation. Users who need multiple conversations pass their own `conversation_id` on invoke:


		Each agent in a swarm or graph has its own independent `model_state` and conversation ID mapping. `model_state` is reset alongside `messages` and `state` in `reset_executor_state()`, following the existing reset pattern.

		- When `model_state` is reset (no conversation ID), the first request sends the full message history (including prefilled messages and context summaries), starting a new server-side conversation


		### Plugin Pattern

		Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:


		## Questions

		- Background/async inference: Should we support `background: true` (fire-and-forget with polling) in the initial release?

		class BedrockModel(Model, Plugin):
		name = "strands:bedrock-model"

Conversation

pgrayy commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Preview Ready

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zastrowm Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

github-actions bot commented Mar 26, 2026 •

edited

Loading

zastrowm Mar 26, 2026 •

edited

Loading

pgrayy Mar 26, 2026 •

edited

Loading