Skip to content

feat(memory): harden memory pipeline with sanitization and checks#28

Open
leandrodamascena wants to merge 3 commits intoaws-samples:mainfrom
leandrodamascena:feat/26-harden-memory-inputs
Open

feat(memory): harden memory pipeline with sanitization and checks#28
leandrodamascena wants to merge 3 commits intoaws-samples:mainfrom
leandrodamascena:feat/26-harden-memory-inputs

Conversation

@leandrodamascena
Copy link
Copy Markdown

Closes #26

The agent learns from past tasks and applies that knowledge to new ones. But before we expand the memory loop (review feedback, cross-task patterns), we need to trust what goes in and what comes out. Today nothing is validated: GitHub issue bodies, PR review comments, and memory records from AgentCore all go straight into the agent's system prompt without any filtering or verification.

This PR adds three things:

1. Content sanitization

A sanitizeExternalContent() function (TS + Python mirror) that strips HTML tags, neutralizes prompt injection patterns (SYSTEM: ignore previous instructions), and removes control characters and Unicode bidi overrides. Applied on:

  • Memory reads (before injecting repo knowledge and past episodes into the prompt)
  • GitHub issue/PR content (title, body, comments, review threads)

2. Source provenance

Every memory write now includes a source_type field so we can tell where a record came from:

  • agent_episode (agent wrote it after completing a task)
  • agent_learning (agent wrote repo knowledge it discovered)
  • orchestrator_fallback (orchestrator wrote a minimal episode because the agent didn't)

This matters when we start ingesting review feedback from external sources (#27). We need to know what's agent-generated vs what came from outside.

3. Content integrity

SHA-256 hash stored on write, verified on read. If someone or something tampers with a memory record, the orchestrator logs a warning. Fail-open (still uses the record) because blocking on a hash mismatch would break the agent for a non-critical issue. Records from before this change (schema v2) skip verification.

Schema version bumped from 2 to 3.

How I tested

Unit tests:

  • CDK: 642 tests passing (mise //cdk:test), including 25 new tests across sanitization.test.ts, memory.test.ts, and context-hydration.test.ts
  • Python: 85 tests passing, including 11 new tests for sanitization, provenance, and hashing

End-to-end:

  • Deployed to my account, submitted a task
  • Agent completed, wrote memory with new metadata (source_type, content_sha256, schema_version 3)
  • No regressions in the orchestrator flow

Files

File What
cdk/src/handlers/shared/sanitization.ts new, sanitization function
cdk/src/handlers/shared/memory.ts sanitize on read, hash + provenance on write
cdk/src/handlers/shared/context-hydration.ts sanitize GitHub issue/PR content
agent/src/prompt_builder.py Python sanitization mirror
agent/src/memory.py provenance + hash on writes, schema v3
+ 5 test files 36 new tests total

@leandrodamascena leandrodamascena requested a review from a team as a code owner April 13, 2026 15:22
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 13, 2026

From the automated code review:

PR: 529 additions, 19 deletions across 10 files
Branch: feat/26-harden-memory-inputs → main


Critical Issues (1 found)

[error-handling] Integrity mismatch warning logs insufficient context for security
investigation
cdk/src/handlers/shared/memory.ts — loadMemoryContext callers of verifyContentIntegrity

The hash mismatch warning logs only repo and namespace. It omits the record ID, expected
hash, computed hash, source_type, schema_version, content length, and any metric type.
Since the entire purpose of this feature is to detect tampering, the logging must be
sufficient for post-incident investigation. The project already uses metric_type for
operational tracking (see guardrail code), but the integrity check omits it.
Recommendation:

logger.warn('Memory record content integrity check failed — using content anyway
(fail-open)', {
repo,
namespace: semanticNamespace,
record_type: 'repo_knowledge', // or 'past_episode'
expected_hash: record.metadata?.content_sha256?.stringValue ?? '(none)',
actual_hash: hashContent(text),
source_type: record.metadata?.source_type?.stringValue ?? '(unknown)',
content_length: text.length,
metric_type: 'memory_integrity_mismatch',
});


Important Issues (7 found)

  1. [error-handling] sanitizeExternalContent silently passes null/undefined through with a
    string return type
    cdk/src/handlers/shared/sanitization.ts:475 — The if (!text) return text guard accepts
    falsy non-string values. Call sites rely on this for || '(no description)' fallbacks, but
    any future call site omitting the fallback will silently pass unsanitized null into the
    agent prompt. Either widen the type signature or coerce falsy to ''.
  2. [error-handling] Python sanitize_memory_content passes non-string falsy values through
    silently
    agent/src/prompt_builder.py:107 — Same issue: if not text: return text passes None
    through, which produces "- None" literal in the system prompt via f-string interpolation.
  3. [error-handling] Python _log_error omits the error message and ignores
    memory_id/task_id parameters
    agent/src/memory.py:47-55 — The format string logs {type(err).name} but not {err}, so
    operators see "TypeError" without knowing what the error was. The memory_id and task_id
    parameters are accepted but never used.
  4. [comments] Stale schema version comment in Python memory.py
    agent/src/memory.py:20-22 — The comment only explains v1 and v2 but the value was bumped
    to "3". It should mention what v3 adds (source_type + content_sha256).
  5. [comments] Misleading "oldest entries dropped first" in loadMemoryContext JSDoc
    cdk/src/handlers/shared/memory.ts:99 — The code does not sort by age. For semantic
    search, entries are relevance-ranked. The comment should say "entries beyond the budget
    are dropped; knowledge is prioritized before episodes."
  6. [tests] Python sanitize_memory_content tests are minimal (5 tests) vs TypeScript (18
    tests)
    agent/tests/test_prompts.py — Missing: iframe/style/object/embed/form/input tag
    stripping, bidi character stripping, BOM handling, "disregard above/all" phrases, "new
    instructions:" phrase, combined attack vectors, None input handling. Since the Python
    side is the last defense before the LLM prompt, parity matters.
  7. [tests] DANGEROUS_TAGS regex: nested/malformed HTML tags not tested
    cdk/test/handlers/shared/sanitization.test.ts — Only well-formed
<script>alert("xss")</script> is tested. Attackers craft nested same-name tags

(<script><script>inner</script></script>), unclosed tags, and tags with > inside
attribute values. These edge cases should be covered.


Suggestions (9 found)

  1. [error-handling] No hash verification on the Python read path — Currently safe since
    reads flow through the orchestrator, but a comment in agent/src/memory.py should note
    that if direct reads are ever added, hash verification must be ported from memory.ts.
  2. [error-handling] No detection of corrupted v3 records — A record with schema_version:
    "3" but missing content_sha256 (corrupted write) passes verification silently. Consider
    logging this condition.
  3. [error-handling] diff_hunk, path, and diff_summary not sanitized — These are external
    inputs injected into the agent prompt. The choice to leave them unsanitized is defensible
    (code content could be corrupted by sanitization, and they're in markdown code blocks),
    but deserves an explicit comment or test documenting the decision.
  4. [tests] Hash mismatch test does not assert logger.warn was called — memory.test.ts:169
    is named "logs warning on integrity check failure" but only asserts content is returned,
    not that the warning was emitted.
  5. [tests] No test for and tag stripping — These are in the DANGEROUS_TAGS
    regex but have no dedicated tests.
  6. [tests] No cross-language SHA-256 parity test — TS createHash('sha256').update(text)
    defaults to UTF-8, matching Python's explicit .encode("utf-8"), but a test with a known
    input/expected-hash in both languages would guard against encoding drift.
  7. [comments] Python docstrings for write_task_episode and write_repo_learnings don't
    mention new metadata fields — source_type and content_sha256 were added but docstrings
    weren't updated.
  8. [comments] hashContent JSDoc should note UTF-8 encoding for cross-language
    compatibility — The implicit UTF-8 encoding is load-bearing for integrity checks.
    Suggestion: /** Compute SHA-256 hash of text content (UTF-8 encoded; must match
    agent/src/memory.py). */
  9. [comments] Pre-existing: context-hydration.ts lines 301 and 712 reference
    agent/entrypoint.py but the functions are actually in agent/src/context.py — Convenient
    to fix in this PR since the file is already modified.

Strengths

  • Consistent cross-language implementation: All 7 regex patterns are semantically
    identical between TypeScript and Python. SHA-256 hashing produces identical results
    cross-language (verified with both ASCII and non-ASCII strings).
  • Correct order of operations: Hash verification is done on raw text before sanitization
    is applied, matching the hash computed at write time.
  • Sound "neutralize-not-block" design: Replacing injection patterns with
    [SANITIZED_PREFIX] markers preserves visibility for legitimate prompt engineering
    discussion while structurally defanging attacks.
  • Clean backward compatibility: Schema v2 records without content_sha256 skip
    verification gracefully. Fail-open design is explicitly documented and justified.
  • Good separation of concerns: Sanitization extracted to its own module with a single
    exported function, making it independently testable and reusable.
  • Integration-level test coverage: The context-hydration tests verify sanitization
    through assembleUserPrompt and assemblePrIterationPrompt, catching cases where
    sanitization could be bypassed during prompt assembly.
  • Source provenance (source_type): Creates a useful audit trail distinguishing
    agent_episode, agent_learning, and orchestrator_fallback records for future
    investigation.

Recommended Action

  1. Fix the critical logging issue (finding 1) — this is the highest priority since it
    undermines the purpose of the integrity check feature
  2. Address the important issues (type safety of sanitization inputs, Python _log_error,
    stale comments)
  3. Expand Python test coverage to approach parity with TypeScript
  4. Consider the suggestions for test and comment improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orchestrator/Memory: Harden memory inputs before accepting external content

2 participants