Problem
soul remember writes to memory storage with no deduplication, no contradiction detection, and no significance gating. Every call creates a new MemoryEntry regardless of whether the same fact (or a near-identical version) is already stored.
The dedup machinery already exists. MemoryManager.observe() runs the smart pipeline (sentiment, significance, fact extraction, reconcile_fact(), contradiction detection) and is exposed as the soul_observe MCP tool. But the CLI has no equivalent. So anything written from a script, a shell hook (e.g. soul-sync.sh), or a manual user invocation goes through the blunt path and accumulates duplicates over time.
The naming amplifies the gap. "Remember" implies thoughtful incorporation, the way humans remember (selectively, with consolidation). The implementation is a dumb append. Users who read soul remember "X" reasonably assume it integrates X with what's already known. It doesn't.
Evidence
Blunt path (current soul remember):
src/soul_protocol/cli/main.py:1092 — CLI command
src/soul_protocol/runtime/soul.py:1383 — Soul.remember() calls self._memory.add() directly
src/soul_protocol/runtime/memory/manager.py:925 — add() writes to the tier store with no checks
Smart path (existing observe):
src/soul_protocol/runtime/memory/manager.py:944 — MemoryManager.observe()
- Line 1043 calls
reconcile_fact() from dedup.py for each extracted fact
- Lines 1095-1109 run contradiction detection on stored facts
Dedup function ready to use:
src/soul_protocol/runtime/memory/dedup.py:101 — reconcile_fact(new_fact, existing_facts) returns ("CREATE" | "SKIP" | "MERGE", merge_target_id)
- Thresholds:
>0.85 SKIP, 0.6-0.85 MERGE, <0.6 CREATE
- Uses Jaccard + containment coefficient on tokenized strings, no LLM call
Proposed solution
Phase 1: add soul observe to the CLI (additive, non-breaking)
New command that takes a single fact-shaped string and routes through:
reconcile_fact() against existing facts in the same tier
- Apply the SKIP / MERGE / CREATE decision
- Optional contradiction detection (default: on for semantic, off for episodic)
- Skip the LLM-based fact-extraction stage (input is already fact-shaped, no extraction needed)
- Skip sentiment + significance gating (caller asserted it's worth remembering)
Signature:
soul observe <path> "<fact>" \
--type [episodic|semantic|procedural|social] \
--importance 1-10 \
--domain <name> \
--emotion <tag> \
--no-dedup # opt-out for the rare blunt-write case
--no-contradictions # opt-out for contradiction detection
Output should report the action taken: CREATED / SKIPPED (with similarity score and existing memory ID) / MERGED (with old and new IDs).
Phase 2: deprecate soul remember
- Add a
DeprecationWarning when soul remember is invoked, pointing at soul observe and the --no-dedup flag for callers who want the current blunt behavior.
- Update README, docs, and examples to use
soul observe.
- Update the soul-sync.sh hook in paw-workspace to use
soul observe for semantic memories (with --no-dedup for episodic events that are unique by time).
- Keep
soul remember working as an alias for at least one minor release.
Phase 3: remove soul remember (track in a separate issue, next major version)
Acceptance criteria
Notes
Why not change remember behavior in place? A hidden behavior change would surprise existing callers, especially the soul-sync.sh hook that fires non-interactively at session end. Adding a new command plus a deprecation path is safer.
Why not require an Interaction object like the existing observe()? The Interaction shape (user_input + agent_output) doesn't map cleanly to CLI single-string input. The new CLI path skips fact extraction since the user already passes a fact, so it doesn't need the Interaction shape.
What about episodic dedup? Currently reconcile_fact only runs on semantic facts inside observe(). Episodic events are unique by time and shouldn't be deduped by default. The new CLI command should respect this: dedup defaults ON for semantic/procedural/social, OFF for episodic.
Mentee project candidate. The work is well-scoped: dedup function already exists, clear acceptance criteria, no architectural decisions to make. Touches CLI plus runtime plus tests plus docs in modest amounts. Phase 1 alone is a clean LFDT mentorship task if scoped down to "add the command + tests" without the deprecation work.
Problem
soul rememberwrites to memory storage with no deduplication, no contradiction detection, and no significance gating. Every call creates a newMemoryEntryregardless of whether the same fact (or a near-identical version) is already stored.The dedup machinery already exists.
MemoryManager.observe()runs the smart pipeline (sentiment, significance, fact extraction,reconcile_fact(), contradiction detection) and is exposed as thesoul_observeMCP tool. But the CLI has no equivalent. So anything written from a script, a shell hook (e.g.soul-sync.sh), or a manual user invocation goes through the blunt path and accumulates duplicates over time.The naming amplifies the gap. "Remember" implies thoughtful incorporation, the way humans remember (selectively, with consolidation). The implementation is a dumb append. Users who read
soul remember "X"reasonably assume it integrates X with what's already known. It doesn't.Evidence
Blunt path (current
soul remember):src/soul_protocol/cli/main.py:1092— CLI commandsrc/soul_protocol/runtime/soul.py:1383—Soul.remember()callsself._memory.add()directlysrc/soul_protocol/runtime/memory/manager.py:925—add()writes to the tier store with no checksSmart path (existing
observe):src/soul_protocol/runtime/memory/manager.py:944—MemoryManager.observe()reconcile_fact()fromdedup.pyfor each extracted factDedup function ready to use:
src/soul_protocol/runtime/memory/dedup.py:101—reconcile_fact(new_fact, existing_facts)returns("CREATE" | "SKIP" | "MERGE", merge_target_id)>0.85SKIP,0.6-0.85MERGE,<0.6CREATEProposed solution
Phase 1: add
soul observeto the CLI (additive, non-breaking)New command that takes a single fact-shaped string and routes through:
reconcile_fact()against existing facts in the same tierSignature:
Output should report the action taken: CREATED / SKIPPED (with similarity score and existing memory ID) / MERGED (with old and new IDs).
Phase 2: deprecate
soul rememberDeprecationWarningwhensoul rememberis invoked, pointing atsoul observeand the--no-dedupflag for callers who want the current blunt behavior.soul observe.soul observefor semantic memories (with--no-dedupfor episodic events that are unique by time).soul rememberworking as an alias for at least one minor release.Phase 3: remove
soul remember(track in a separate issue, next major version)Acceptance criteria
soul observe <path> "<text>"works and writes to semantic by default--no-dedupflag bypasses dedup and writes raw (preserves currentrememberbehavior)soul rememberemits a deprecation warning pointing atsoul observe--no-dedupopt-outNotes
Why not change
rememberbehavior in place? A hidden behavior change would surprise existing callers, especially thesoul-sync.shhook that fires non-interactively at session end. Adding a new command plus a deprecation path is safer.Why not require an Interaction object like the existing
observe()? The Interaction shape (user_input + agent_output) doesn't map cleanly to CLI single-string input. The new CLI path skips fact extraction since the user already passes a fact, so it doesn't need the Interaction shape.What about episodic dedup? Currently
reconcile_factonly runs on semantic facts insideobserve(). Episodic events are unique by time and shouldn't be deduped by default. The new CLI command should respect this: dedup defaults ON for semantic/procedural/social, OFF for episodic.Mentee project candidate. The work is well-scoped: dedup function already exists, clear acceptance criteria, no architectural decisions to make. Touches CLI plus runtime plus tests plus docs in modest amounts. Phase 1 alone is a clean LFDT mentorship task if scoped down to "add the command + tests" without the deprecation work.