Skip to content

feat(cli): add 'soul observe' command with dedup pipeline; deprecate 'soul remember' #231

@prakashUXtech

Description

@prakashUXtech

Problem

soul remember writes to memory storage with no deduplication, no contradiction detection, and no significance gating. Every call creates a new MemoryEntry regardless of whether the same fact (or a near-identical version) is already stored.

The dedup machinery already exists. MemoryManager.observe() runs the smart pipeline (sentiment, significance, fact extraction, reconcile_fact(), contradiction detection) and is exposed as the soul_observe MCP tool. But the CLI has no equivalent. So anything written from a script, a shell hook (e.g. soul-sync.sh), or a manual user invocation goes through the blunt path and accumulates duplicates over time.

The naming amplifies the gap. "Remember" implies thoughtful incorporation, the way humans remember (selectively, with consolidation). The implementation is a dumb append. Users who read soul remember "X" reasonably assume it integrates X with what's already known. It doesn't.

Evidence

Blunt path (current soul remember):

  • src/soul_protocol/cli/main.py:1092 — CLI command
  • src/soul_protocol/runtime/soul.py:1383Soul.remember() calls self._memory.add() directly
  • src/soul_protocol/runtime/memory/manager.py:925add() writes to the tier store with no checks

Smart path (existing observe):

  • src/soul_protocol/runtime/memory/manager.py:944MemoryManager.observe()
  • Line 1043 calls reconcile_fact() from dedup.py for each extracted fact
  • Lines 1095-1109 run contradiction detection on stored facts

Dedup function ready to use:

  • src/soul_protocol/runtime/memory/dedup.py:101reconcile_fact(new_fact, existing_facts) returns ("CREATE" | "SKIP" | "MERGE", merge_target_id)
  • Thresholds: >0.85 SKIP, 0.6-0.85 MERGE, <0.6 CREATE
  • Uses Jaccard + containment coefficient on tokenized strings, no LLM call

Proposed solution

Phase 1: add soul observe to the CLI (additive, non-breaking)

New command that takes a single fact-shaped string and routes through:

  1. reconcile_fact() against existing facts in the same tier
  2. Apply the SKIP / MERGE / CREATE decision
  3. Optional contradiction detection (default: on for semantic, off for episodic)
  4. Skip the LLM-based fact-extraction stage (input is already fact-shaped, no extraction needed)
  5. Skip sentiment + significance gating (caller asserted it's worth remembering)

Signature:

soul observe <path> "<fact>" \
  --type [episodic|semantic|procedural|social] \
  --importance 1-10 \
  --domain <name> \
  --emotion <tag> \
  --no-dedup            # opt-out for the rare blunt-write case
  --no-contradictions   # opt-out for contradiction detection

Output should report the action taken: CREATED / SKIPPED (with similarity score and existing memory ID) / MERGED (with old and new IDs).

Phase 2: deprecate soul remember

  1. Add a DeprecationWarning when soul remember is invoked, pointing at soul observe and the --no-dedup flag for callers who want the current blunt behavior.
  2. Update README, docs, and examples to use soul observe.
  3. Update the soul-sync.sh hook in paw-workspace to use soul observe for semantic memories (with --no-dedup for episodic events that are unique by time).
  4. Keep soul remember working as an alias for at least one minor release.

Phase 3: remove soul remember (track in a separate issue, next major version)

Acceptance criteria

  • soul observe <path> "<text>" works and writes to semantic by default
  • Calling it twice with similar text produces SKIP or MERGE (not two separate entries)
  • Contradiction detection runs and supersedes conflicting facts
  • --no-dedup flag bypasses dedup and writes raw (preserves current remember behavior)
  • Output panel reports the action taken (CREATED / SKIPPED / MERGED) with relevant IDs
  • soul remember emits a deprecation warning pointing at soul observe
  • Tests cover all three dedup paths and the --no-dedup opt-out
  • CHANGELOG updated, README examples updated

Notes

Why not change remember behavior in place? A hidden behavior change would surprise existing callers, especially the soul-sync.sh hook that fires non-interactively at session end. Adding a new command plus a deprecation path is safer.

Why not require an Interaction object like the existing observe()? The Interaction shape (user_input + agent_output) doesn't map cleanly to CLI single-string input. The new CLI path skips fact extraction since the user already passes a fact, so it doesn't need the Interaction shape.

What about episodic dedup? Currently reconcile_fact only runs on semantic facts inside observe(). Episodic events are unique by time and shouldn't be deduped by default. The new CLI command should respect this: dedup defaults ON for semantic/procedural/social, OFF for episodic.

Mentee project candidate. The work is well-scoped: dedup function already exists, clear acceptance criteria, no architectural decisions to make. Touches CLI plus runtime plus tests plus docs in modest amounts. Phase 1 alone is a clean LFDT mentorship task if scoped down to "add the command + tests" without the deprecation work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomerslfdt-mentorshipLFDT Decentralized Trust Mentorship voluntary track — mentee deliverables (issue #75)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions