Skip to content

feat: design doc for stateful model providers#712

Open
pgrayy wants to merge 1 commit intostrands-agents:mainfrom
pgrayy:designs/stateful-models
Open

feat: design doc for stateful model providers#712
pgrayy wants to merge 1 commit intostrands-agents:mainfrom
pgrayy:designs/stateful-models

Conversation

@pgrayy
Copy link
Copy Markdown
Member

@pgrayy pgrayy commented Mar 26, 2026

Proposes stateful model provider support for the Strands Python SDK, targeting the OpenAI Responses API on Amazon Bedrock (Project Mantle). Covers the BedrockModel subpackage refactor, model_state for stateless providers, per-invocation message clearing, user-defined conversation ID mapping, session persistence, and multi-agent compatibility.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-712/docs/user-guide/quickstart/overview/

Updated at: 2026-03-26T17:38:28.135Z

@pgrayy pgrayy force-pushed the designs/stateful-models branch from 0bb6aea to a04a907 Compare March 26, 2026 17:29

## Background

The OpenAI Responses API is hosted on AWS Bedrock's Mantle endpoint (`bedrock-mantle.{region}.api.aws`). It uses an OpenAI-compatible format and supports stateful server-side conversation management, where the server tracks context across turns so the client only sends the latest message.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this basically agentcore memory?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how they have it implemented but I think it is something more native to Mantle. Either way, how it works is something opaque to the user. They don't need to care about how it is setup just that it works.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key takeaways

  • Amazon and OpenAI will co-create a Stateful Runtime Environment that allows AI agents to maintain context and access compute resources.
  • Developers can build production-scale AI applications without starting from scratch each time.
  • The environment represents the next generation of how frontier AI models will be used.

model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514",
api="responses",
compute_environment="agentcore",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on this functionality? Is the goal to have this run on lambda or agentcore or any compute? How does that work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably direct you toward https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html for more details on this specific feature. Unfortunately, I haven't had a chance to experiment yet with the compute environments.

└── responses.py # new Responses API implementation
```

`BedrockModel` becomes a facade. The `api` parameter controls dispatch:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the decisions we made in TS, why go for a unified interface?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ergonomics in short. And apologies but I think you missed the follow up discussion. For TS, we decided to maintain the unified interface there as well. No more namespacing based on API. I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note that this unified interface is better for our model driven approach because you can now switch APIs with a config as opposed to initializing a whole new Model instance.

I agree as far as the configs are similar; if we start having a situation where ProviderConfig1 looks very different from ProviderConfig2 then the idea falls apart

├── __init__.py # exports BedrockModel, backward-compatible imports
├── base.py # shared config, region resolution, boto session, facade logic
├── converse.py # current Converse/ConverseStream (extracted from bedrock.py)
└── responses.py # new Responses API implementation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a separate provider though? I mean it's the same API spec, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just use responses, or extend that one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I got my answer couple paragraphs later 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. And it's different endpoints as well. There is the bedrock-runtime converse API which we support today but doesn't offer the features listed above. This doc is about supporting the bedrock-mantle responses API, which is a new endpoint and spec.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just use responses, or extend that one?

To expand a bit more on this, I thought it would be weird to only allow this new bedrock endpoint to be hit by the OpenAI model provider. Optics wise, I think it is important to also support it through the BedrockModel. Of course, we can iterate on that. We don't have to support it right away.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think it as ResponsesAPI now. Like endpoint should probably be just a config with default. There are many implementations


### Messages

When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.
Copy link
Copy Markdown
Member

@zastrowm zastrowm Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When model_state contains a conversation ID,

This seems very api specific. Could this instead be an option on the model? ala Model.is_stateful?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below you're introducing conversation_id as a top-level concept, which makes a bit more sense

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah the actual check we can make a bit more explicit. I think something like Model.is_stateful would be better.


- The server owns conversation history in stateful mode, so clearing locally avoids confusion about what the model sees and prevents unbounded memory growth
- `MessageAddedEvent` hooks still fire for each message during the invocation
- Session managers persist messages as they happen via hooks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session management currently hydrates the agent on init. When using this stateful approach, would people be guided to not use session management? I think this would replace the need for it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't entirely replace the need for it. Customers have custom state data they session manage. Interrupt state is session managed as well. The conversation ids proposed here would be session managed as well. So there is a need for it. The one big difference is messages. You wouldn't hydrate the messages array in the Agent when resuming a stateful model session.

Comment on lines +144 to +146
agent("What was my last charge?", conversation_id="billing")

agent("Track my order", conversation_id="orders")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these isolated entirely? Is it safe to use two different 'users' with the same agent but with different conversation_ids?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we couldn't. There of course will be some more quirks to this not called out in this document. For example, if you interrupt an agent instance, we should be careful about resuming on the same conversation.


### Conversations

The Responses implementation maps user-defined conversation IDs to server-generated response IDs in `model_state`. Users work with their own meaningful IDs and never need to manage server-generated ones. By default, all invocations use a `"default"` conversation. Users who need multiple conversations pass their own `conversation_id` on invoke:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is conversation_id on invoke and not at the agent level?

Given that model_state is per agent, it feels odd that this isn't an agent-level

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See example below. The proposal is to support switching conversations on the same agent instance when running a stateful model.


### Messages

When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean conversation is fully managed by the server? what happens if I want to change my model provider? (idk how common that use case is)

or what would snapshots look like for evals? do we depend on traces explicitly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to be careful about changing your model provider in this scenario. We can put in protections to help avoid common footguns.

Regarding snapshots for evals, I would have to consult with @afarntrog and @poshinchen on this. We can expose methods to retrieve conversation history from the server if need be.

- If the user provides a different conversation manager, we emit a warning (not an exception)
- `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management

### Session Management
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does session_d play compared to conversation_id? Are they assumed to be the same?

- Request formatting and streaming event parsing are extracted into shared utilities used by both `bedrock/responses.py` and the existing `OpenAIResponsesModel`
- Provider-specific logic (auth, endpoint, client creation) stays in each provider

### Model State
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could models also store other state? I'm thinking/wondering - do we need a session-manager in this case or could we store it all in the backend provider?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ example gemini thought signature. we have a basic implementation of it, should that also be model state?

- If the user provides a different conversation manager, we emit a warning (not an exception)
- `ContextWindowOverflowException` is not retried client-side in stateful mode since the server handles context management

### Session Management
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some more clarity on the different responsibilities between a session manager and the stateful server-side conversation management?


Each agent in a swarm or graph has its own independent `model_state` and conversation ID mapping. `model_state` is reset alongside `messages` and `state` in `reset_executor_state()`, following the existing reset pattern.

- When `model_state` is reset (no conversation ID), the first request sends the full message history (including prefilled messages and context summaries), starting a new server-side conversation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't we clear the agent.messages? how can we reset? what data do we reset to?


### Plugin Pattern

Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I love & hate this at the same time 😆

What about the conversation_id in this case? Would that have to be initialized with the model?

- All conversation mappings survive agent restarts
- All session manager implementations (file, S3, DynamoDB, custom) get this automatically since `_internal_state` is already serialized

### Multi-Agent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, same considerations apply to agent as tool strands-agents/sdk-python#1932 cc @notowen333


## Questions

- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's long running agent. If your agent is running for 50 minutes, you don't want to depend on single stream being open for that amount of time


### Messages

When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add an explicit key in model_state like api_type == responses so this casing is more explicit


### Plugin Pattern

Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this

## Questions

- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?
- **Mantle feature parity**: Which Converse features (guardrails, prompt caching) are NOT available through the Responses API?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this helps drive the Bedrock facade decision too


### Plugin Pattern

Rather than the Agent having special-case `if stateful:` logic, the model provider could extend `Plugin` and register hooks for its lifecycle behaviors:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DevX is scaring me here a bit though. Would devs need to add to both model and plugins? Or do we combine behind the scenes (and assume any param can also be a plugin)


## Questions

- **Background/async inference**: Should we support `background: true` (fire-and-forget with polling) in the initial release?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take it P1, because I consider streaming to be the default. If we can merge it, it's great. If not, I think streaming should be a good start

)
```

- The Converse path uses boto3; the Responses path uses the OpenAI Python SDK with SigV4 signing via a custom httpx transport that resolves credentials from the same boto session
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What overlap is there between BedrockModel and a theoretical BedrockResponsesModel?


**Date**: 2026-03-26

## Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one high level comment: I'd like to see more dev story / DevX. What do devs use today, and how will this model interact with them. Use cases, etc.

# Responses API with compute environment
model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514",
api="responses",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do i set if I want local orchestration or cloud orchestration? Can't I use the responses API with full conversation history, as it is normally done with Strands

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, accidentally left out a config. Meant to also demonstrate a stateful=True parameter to explicitly turn on stateful mode.

When `model_state` contains a conversation ID, the Agent clears `agent.messages` at the start of each top-level invocation. Within an invocation, messages are appended normally (the event loop needs them for tool execution). After the invocation, `agent.messages` contains only that invocation's messages.

```python
agent = Agent(model=BedrockModel(api="responses"))
Copy link
Copy Markdown
Member Author

@pgrayy pgrayy Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, meant to demonstrate an explicit param to turn on stateful mode:

agent = Agent(model=BedrockModel(api="responses", stateful=True))

Comment on lines +195 to +196
class BedrockModel(Model, Plugin):
name = "strands:bedrock-model"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, meant to make this part of Model, not specifically BedrockModel. This way it applies to all implementations.

@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Mar 27, 2026

Also (just thought of this) I don't think this model would work with use_agent it reuses parent agent model as default iirc. We might need to update it (either throw error, or better yet handle it similar to state resets)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that I have started working on the implementation, I have identified a more appropriate path forward. You'll see in the PR but wanted to give a heads up that I'll be utilizing conversation manager instead of a model plugin to handle message clearing. I am also going to stream a new event type to capture the generated previous response id. These together with users enabling state mode on Model init will create a better devx in my opinion.

A few other notes. We are clearing on after invocation now instead of before. This helps avoid removing message prefills in graph and swarm. Also, won't be passing model state into model stream. Just need to pass in the previous response id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants