feat(memory): harden memory pipeline with sanitization and checks#28
feat(memory): harden memory pipeline with sanitization and checks#28leandrodamascena wants to merge 3 commits intoaws-samples:mainfrom
Conversation
…nd integrity checks
|
From the automated code review: PR: 529 additions, 19 deletions across 10 files Critical Issues (1 found) [error-handling] Integrity mismatch warning logs insufficient context for security The hash mismatch warning logs only repo and namespace. It omits the record ID, expected logger.warn('Memory record content integrity check failed — using content anyway Important Issues (7 found)
(<script><script>inner</script></script>), unclosed tags, and tags with > inside Suggestions (9 found)
Strengths
Recommended Action
|
Closes #26
The agent learns from past tasks and applies that knowledge to new ones. But before we expand the memory loop (review feedback, cross-task patterns), we need to trust what goes in and what comes out. Today nothing is validated: GitHub issue bodies, PR review comments, and memory records from AgentCore all go straight into the agent's system prompt without any filtering or verification.
This PR adds three things:
1. Content sanitization
A
sanitizeExternalContent()function (TS + Python mirror) that strips HTML tags, neutralizes prompt injection patterns (SYSTEM: ignore previous instructions), and removes control characters and Unicode bidi overrides. Applied on:2. Source provenance
Every memory write now includes a
source_typefield so we can tell where a record came from:agent_episode(agent wrote it after completing a task)agent_learning(agent wrote repo knowledge it discovered)orchestrator_fallback(orchestrator wrote a minimal episode because the agent didn't)This matters when we start ingesting review feedback from external sources (#27). We need to know what's agent-generated vs what came from outside.
3. Content integrity
SHA-256 hash stored on write, verified on read. If someone or something tampers with a memory record, the orchestrator logs a warning. Fail-open (still uses the record) because blocking on a hash mismatch would break the agent for a non-critical issue. Records from before this change (schema v2) skip verification.
Schema version bumped from 2 to 3.
How I tested
Unit tests:
mise //cdk:test), including 25 new tests across sanitization.test.ts, memory.test.ts, and context-hydration.test.tsEnd-to-end:
Files
cdk/src/handlers/shared/sanitization.tscdk/src/handlers/shared/memory.tscdk/src/handlers/shared/context-hydration.tsagent/src/prompt_builder.pyagent/src/memory.py