[BUG] Guardrail input redaction not persisted to AgentCoreMemorySessionManager — unredacted offensive content replayed on subsequent turns

### Checks

- [x] I have updated to the latest minor and patch version of Strands
- [x] I have checked the documentation and this is not expected behavior
- [x] I have searched [./issues](./issues?q=) and there are no duplicates of my issue

### Strands Version

1.28.0

### Python Version

3.12.8

### Operating System

macOS 14.3

### Installation Method

pip

### Steps to Reproduce

1. Configure a Strands agent with Bedrock guardrails enabled (`guardrail_redact_input=True`, which is the default) and `AgentCoreMemorySessionManager` for session persistence.

2. Start a conversation with a legitimate message (e.g., "Suggest a metadata schema for our library").

3. Send a message that triggers the guardrail (e.g., sexually explicit content). The guardrail correctly blocks this and the agent returns the `blockedInputMessaging` text.

4. Send a completely legitimate follow-up message (e.g., "I'd like to have a catalog of books").

5. The guardrail blocks this legitimate message too, even though it contains no policy-violating content.

6. The conversation is now permanently stuck — every subsequent message is blocked by the guardrail.

**Minimal reproduction:**

```python
from strands import Agent
from strands.models.bedrock import BedrockModel
from bedrock_agentcore.memory.integrations.strands import (
    AgentCoreMemorySessionManager,
    AgentCoreMemoryConfig,
)

model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    guardrail_id="YOUR_GUARDRAIL_ID",
    guardrail_version="DRAFT",
    # guardrail_redact_input=True  # this is the default
)

session_manager = AgentCoreMemorySessionManager(
    agentcore_memory_config=AgentCoreMemoryConfig(
        memory_id="YOUR_MEMORY_ID",
        actor_id="test_user",
        session_id="test_session",
    ),
)

agent = Agent(model=model, session_manager=session_manager)

# Turn 1: legitimate message — works fine
agent("Hello, help me organize my assets")

# Turn 2: offensive message — correctly blocked by guardrail
agent("inappropriate content here")

# Turn 3: legitimate message — INCORRECTLY blocked
# because the original unredacted offensive content from Turn 2
# was persisted to AgentCore Memory and is replayed into context
agent("Actually, I'd like to organize my photos by date")
```

### Expected Behavior

When a guardrail blocks a message and `guardrail_redact_input=True` (the default):

1. The offensive user message should be redacted both **in-memory** and **in the persistent session store**.
2. On subsequent turns, the conversation history loaded from AgentCore Memory should contain only the redacted version (`"[User input redacted.]"`), not the original offensive text.
3. Follow-up legitimate messages should not be blocked by the guardrail.

### Actual Behavior

The redaction only happens **in-memory** but is **never persisted** to AgentCore Memory. Here's the chain of events:

1. When a guardrail intervenes, `Agent._event_stream_handler` calls:
   ```python
   self.messages[-1]["content"] = self._redact_user_content(...)
   if self._session_manager:
       self._session_manager.redact_latest_message(self.messages[-1], self)
   ```

2. `RepositorySessionManager.redact_latest_message` (line 81-93 in `repository_session_manager.py`) sets `latest_agent_message.redact_message = redact_message` and calls `self.session_repository.update_message(...)`.

3. **The problem**: `AgentCoreMemorySessionManager.update_message` (line 490-511 in the bedrock-agentcore package) is effectively a **no-op** — it only logs at DEBUG level:
   ```python
   def update_message(self, session_id, agent_id, session_message, **kwargs):
       # ...validation...
       logger.debug(
           "Message update requested for message: %s (AgentCore Memory doesn't support updates)",
           {session_message.message_id},
       )
   ```

4. On the next turn, `RepositorySessionManager.initialize` loads messages from AgentCore Memory via `list_messages` → `events_to_messages`. Since the redaction was never persisted, the **original unredacted offensive content** is loaded back into `agent.messages`.

5. Even with `guardrail_latest_message=True`, the full conversation history (including the unredacted offensive text) is sent to Bedrock as context.

6. Without `guardrail_latest_message=True` (the default), the guardrail evaluates ALL messages, and the unredacted offensive message causes every subsequent turn to be blocked.

### Additional Context

- The `update_message` no-op in `AgentCoreMemorySessionManager` is documented with the comment "AgentCore Memory doesn't support updates", which is technically true — AgentCore Memory is an append-only event store. However, the SDK's redaction mechanism relies on `update_message` actually persisting the change.
- The `SessionMessage.to_message()` method correctly returns `redact_message` if set, but since `update_message` is a no-op, the `redact_message` field is never stored, and when messages are reloaded from AgentCore Memory they have no `redact_message` set.
- Issue #1639 describes a related but different problem (wrong message being redacted when LTM context is present). This issue is about the redaction never being persisted regardless of which message is targeted.
- We confirmed this behavior with diagnostic hooks that log the full conversation history sent to the model — the original offensive text is visible in history on subsequent turns.

### Possible Solution

Since AgentCore Memory is append-only and doesn't support in-place updates, the redaction strategy needs to change. Some options:

1. **Deferred persistence**: Buffer the user message in the session manager and only persist it **after** the model responds. If a guardrail intervenes and redacts the message, persist the redacted version instead. We implemented this as a workaround (a `GuardrailSafeSessionManager` wrapper) and confirmed it resolves the issue.

2. **Pre-flight guardrail check**: Before appending the user message to the session, call the Bedrock `ApplyGuardrail` API to check the content. If it violates the policy, persist only the redacted version.

### Related Issues

- #1639 — guardrail_redact_input overrides LTM message instead of user message
- #1671 — Bedrock Guardrail False Positive on Tool Results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Guardrail input redaction not persisted to AgentCoreMemorySessionManager — unredacted offensive content replayed on subsequent turns #2119

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Guardrail input redaction not persisted to AgentCoreMemorySessionManager — unredacted offensive content replayed on subsequent turns #2119

Description

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions