fix(memory): consolidation & recall correctness — live crisis graph, score clamp, reflection embedding, crash recovery by Marsu6996 · Pull Request #22 · CodeAbra/iai-personal-memory-engine

Marsu6996 · 2026-06-21T15:33:38Z

Summary

This is the consolidation- and recall-correctness half of #17 (the daemon
WAKE/idle CPU-storm half is the companion PR). After a healthy store accumulates
soft-deleted / deduped records, the sleep pipeline rebuilds its topology over the
tombstoned nodes too, so rich_club sits just under its crisis floor and
re-arms crisis_mode every cycle; meanwhile a degraded daemon serves recall as
flat 1.000 scores. Each commit fixes one root cause behind those symptoms, on
top of v1.1.4.

What changed (one logical change per commit)

fix(capture) — stop re-embedding seen turns, close the dedup race. Active
sessions re-drain the whole transcript every turn, so the eager embed at the top
of capture_turn re-ran the GIL-bound Rust matmul for every already-stored turn
just to discard it on the idem-tag check. Embedding is now deferred behind a
memoized _compute_embedding() that runs at most once and only for genuinely new
turns; flush_record_buffer before the idem lookup under _CAPTURE_DEDUP_LOCK
closes a check-then-insert race that produced live duplicate records.
fix(recall) — clamp the displayed score to [0,1], rank on unclamped
sort_score. Multiplicative boosts (trigram×2, FTS×3, valence) push the
internal score past 1.0, so a served hit could surface a "confidence" > 1 and a
degraded state surfaced flat 1.000. The displayed score is clamped at
serialization while ranking uses a new, unclamped MemoryHit.sort_score; the
stale-downweight keeps the two in lock-step. Ordering is provably unchanged
(regression test included).
fix(consolidation) — build crisis topology on the live graph; demote
rich_club. The essential-variable tracker and crisis recluster rebuilt over
ALL records including tombstoned ones, so communities / centrality / rich_club
were computed on thousands of dead nodes (rich_club ~0.019, just under the 0.02
floor → re-arms crisis on a healthy store). A shared build_live_graph()
(tombstone-filtered nodes, live-only edges) feeds both paths, and
rich_club_coefficient is demoted from a crisis trigger to a diagnostic-only
signal (edge_density and community_count remain triggers). The runtime-graph
tombstone guard is hardened to pd.isna() so a NaT/NA tombstoned_at on a
reembedded datetime64 column reads as LIVE instead of collapsing the graph.
fix(dmn) — embed the reflection surface text instead of a zero placeholder.
Daily reflections wrote an all-zero embedding that was never re-embedded, leaving
them permanently unretrievable and feeding zero vectors into the scoring matmul.
Now embeds literal_surface; on native failure falls back to a zero vector
flagged embedding_pending=1 for deferred reembed.
fix(sigma) — bound small-worldness with a node-count ceiling. Adds
SIGMA_N_CEIL (default 20000, env IAI_MCP_SIGMA_N_FLOOR / ..._CEIL) returning
None above it so the unbounded random-reference compute can't spin a core.
fix(watchdog) — monitor a retried-but-wedged sleep cycle for staleness. The
staleness check only monitored attempt == 1, so a retried-and-still-wedged
cycle (attempt >= 2) was short-circuited and ignored. Gates on attempt < 1
instead (excluding bool), so every genuine running attempt is monitored.
feat(daemon) — surface a repeated store-empty count failure as telemetry.
The companion PR stops a transient count failure from parking the tick but left
the condition invisible; this emits a best-effort store_empty_check_failed
warning event (buffered, never raises).
fix(hippo) — decrypt literal_surface before reembedding pending rows. On
an encrypted store, reembed_pending_rows embedded the raw iai:enc:v1:
ciphertext, producing a garbage vector for every embedding_pending=1 row. Now
decrypts via the existing _decrypt_record_field (a no-op on a plaintext store);
a decrypt failure leaves the row pending for retry rather than poisoning it.
Distinct from the v1.1.4 reembed fix (04e62e2): that repaired the migration
path (migrate/_reembed_from_text.py::migrate_reembed_from_text); this fixes the
runtime/daemon path (hippo/_db.py::reembed_pending_rows), which v1.1.4 does
not touch — so the two are complementary, not redundant.
fix(daemon) — recover cleanly from a crash mid-SLEEP at boot. A daemon
killed mid-SLEEP leaves lifecycle_state.json at SLEEP with
sleep_cycle_progress=None (incoherent); resuming wedges the daemon so it never
reaches the recluster that clears crisis. A pure _normalize_boot_lifecycle_state
resets exactly that case to a clean WAKE and clears the stale crisis flags at boot.

Type of change

Affected areas

Testing

pytest passes locally — full default gate
(pytest -m "not perf and not slow and not live", 3538 tests), rebased on
v1.1.4: 3514 passed, 33 skipped, 1 xfailed, 1 failed. The single failure
(test_rendered_plist_contains_fd_floor) is pre-existing — it fails
identically on a clean v1.1.4 checkout and is unrelated to this branch.
(Some test_doctor_* rows are environment-flaky on the dev machine — a large
system subprocess output decoded as strict UTF-8 — and pass on a clean run.)
ruff check src/ tests/ — no new findings vs the v1.1.3 baseline on the
touched files (the repo ships no ruff config).
New tests added for changed behaviour — score-clamp order preservation,
tombstone/live-graph filtering on NaT/NA columns, dmn embedding norm,
sigma ceiling + env override, watchdog stale gate, encrypted-store reembed
decrypt, and crash-mid-SLEEP boot normalization. (The capture drain/dedup
commit currently relies on the existing capture tests + a live-store check — a
dedicated unit test would strengthen it; happy to add if you'd like one here.)

Benchmarks

The recall change is display-only and order-preserving: ranking moves to an
unclamped sort_score, the clamp touches only the serialized number, and the
regression test asserts identical ordering — so LongMemEval-S is unaffected by
construction. The dmn fix makes daily reflections retrievable again (previously
zero-vector), which can only help recall on reflection cues. A blind LongMemEval-S
run (python -m bench.longmemeval_blind --split S) can be attached on request; it
is not run by default to keep the blind split blind.

Bench command run: pending — blind LongMemEval-S queued before merge if desired
Before:
After:

Notes for reviewers

Relates to Sleep daemon stuck in crisis_mode: cycle loops on a single step (never completes) while sleep-cycle --force works; served recall degrades to flat 1.000 / schema records #17 (intentionally not "Fixes": the companion WAKE/idle PR carries
the closing keyword, since you flagged the CPU-storm half as the gate for closing
Sleep daemon stuck in crisis_mode: cycle loops on a single step (never completes) while sleep-cycle --force works; served recall degrades to flat 1.000 / schema records #17). This PR addresses the consolidation/recall symptoms — crisis loop from a
tombstoned-polluted graph, flat 1.000 served recall — at their source.
Stacked on the WAKE/idle PR. Several files build on it (capture.py uses
is_drain_in_progress, the runtime-graph tombstone guard, the store-empty path),
so this is best reviewed/merged after it. Rebasing onto main once that lands is
a clean fast-forward.
Complements the crisis auto-expiry (crisis_mode_since_ts, shipped v1.1.3)
rather than duplicating it: the 72 h timeout clears a coherent crisis; this
removes the false trigger so crisis is not (re-)armed on a healthy store in the
first place.
Rebased on v1.1.4. Our files are disjoint from v1.1.4's changes (reembed
migration + analytics), so the rebase was conflict-free; the full gate above was
run on the v1.1.4-based branch.
rich_club_coefficient is no longer a crisis trigger (kept as a diagnostic
event field is_crisis_trigger). New env knobs: IAI_MCP_SIGMA_N_FLOOR /
IAI_MCP_SIGMA_N_CEIL. New MemoryHit.sort_score field defaults to None
(backward compatible — callers fall back to score).

…og status probes The sleep/consolidation pipeline defers whenever _interrupt_check reports recent activity. Two independent signals wrongly marked the daemon "active" on nearly every tick, so it never completed a cycle, never hibernated, and the wake-hook re-ran every 30s — a sustained ~200% CPU churn on any long-lived deployment: 1. _interrupt_check returned True whenever mcp_socket.active_connections > 0. Long-lived MCP clients hold their socket open permanently -> always True. 2. Even after removing (1), last_activity_ts was refreshed for EVERY inbound socket line — including the watchdog's own {"type": "status"} liveness probe sent every 7-30s (daemon/_watchdog.py::_probe_status_roundtrip). So the 30s-activity window never elapsed. Fix: _interrupt_check keys off last_activity_ts recency only, and SocketServer refreshes last_activity_ts only for dispatched JSON-RPC method calls (real recall/capture traffic), never for control-plane messages. A busy burst still defers consolidation; a 30s lull now lets the cycle finish and the daemon hibernate. Adds tests/test_socket_activity_tracking.py locking in that a status probe does not count as activity while a real method call does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ute storm At WAKE several background subsystems (boot preload, sigma identity audit, foraging weak-bridge detection, hippea cascade) each call build_runtime_graph concurrently in their own asyncio.to_thread workers. On a cache miss each one independently ran the full, GIL-bound community detection (mosaic). Three+ at once contended for the GIL, starved the asyncio event loop, and the liveness watchdog's socket probe timed out -> SIGKILL -> launchd relaunch -> loop. Wrap build_runtime_graph in a single-flight gate keyed on the cache key: the first caller (leader) computes and saves the on-disk cache; concurrent callers (followers) wait on an Event and then reload the freshly-saved cache via the existing cheap path. No mutable MemoryGraph is shared between callers (each rebuilds its own shell + single-slot sync hook), and recall is independent of the community assignment, so a slightly-stale shared result is harmless. Followers re-contend in a bounded loop rather than recomputing unconditionally: if the leader fails before saving, the cache key shifts mid-burst, or the wait times out, the woken followers loop back and exactly one becomes the next leader while the rest wait again — degrading those edge cases to sequential single-flight (one compute at a time) instead of an N-way concurrent re-storm. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The cache key buckets on records//WINDOW and edges//WINDOW and try_load requires an exact match. With WINDOW=10 a normal day of capture (+~150 records, +~1300 edges) crossed ~130 buckets, so the on-disk graph cache MISSED on essentially every WAKE and the full community detection was recomputed each time. Edges churn fastest, so they are the binding term. WINDOW=250 keeps the cache valid across a normal day, so the common WAKE is now a cheap cache HIT. The independent age/dirty fuse in consult_overlay (25h / dirty>50) remains the real freshness backstop, and the single-flight gate makes the rare genuine miss harmless. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

_boot_preload called build_runtime_graph (which already persists the cache, with the full node_payload, on a miss) and then called save(..., node_payload= None, ...) again, overwriting the good cache with a payload-less one. That forced a pandas re-read of every record on the next cache hit. Just warm the cache via build_runtime_graph and drop the redundant second save. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The daemon only enters WAKE at boot if ~/.iai-mcp/wake.signal exists, but nothing ever created it — WakeHandler only consumed it. The CLI start/install path (and the operator's capture hook) brought the daemon up with a plain launchctl kickstart, so it re-read its persisted HIBERNATION state and hibernate-exited within a tick, closing the socket before it ever served recall. Add WakeHandler.signal_wake() (symmetric to consume_wake_signal) and create the signal before the kickstart in daemon install/start, so the booting daemon transitions HIBERNATION -> WAKE and serves its socket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

build_runtime_graph added every record (and every edge) to the graph, including soft-deleted / deduped / erased records (tombstoned_at IS NOT NULL). That polluted communities, centrality, rich_club and the sigma topology audit with dead nodes, and -- worse -- desynced the node count from store.active_records_count() (the payload-cache validity anchor), so after any tombstoning (e.g. migrate --dedupe-episodic) the cache was permanently invalid and every WAKE did a full rebuild on an over-large graph. Skip tombstoned rows in the node loop (matching active_records_count: tombstoned_at IS NULL), skip edges whose endpoints are not live nodes (add_edge does setdefault on both endpoints, so it would re-create dead nodes), and drop the cached assignment/rich_club when the live node set changed so they are recomputed on the fresh graph instead of referencing dead nodes. On a real store this took the graph from 9733 nodes to 3612, rich_club from 974 to 362, and restored payload-cache hits across builds.

_store_is_empty() caught (OSError, ValueError, KeyError, RuntimeError) and returned True. All Hippo store errors (HippoIntegrityError, HippoLockHeldError, ConsolidationPendingError, HippoDecryptError) subclass RuntimeError, and count_rows() raises HippoIntegrityError when the shared sqlite connection is left in an error state by a concurrent heavy reader. Returning True there parks the whole lifecycle tick (no idle-check, no drain) on a store that actually has records. Treat the unknown case as NOT empty so the tick proceeds; a truly empty store just does a little harmless no-op work.

The field was only ever set (on the empty_store/paused skip paths), never cleared, so a single early skip (e.g. a first-tick count race at boot) left a healthy, ticking, draining daemon permanently reporting skip=empty_store in .daemon-state.json — misleading observability that reads as a parked lifecycle.

The lifecycle idle countdown only refreshed `_last_active_monotonic` when the Node wrapper heartbeat file was fresh (`HeartbeatScanner.is_active`). The wrappers dir can be empty (heartbeat stale) while the daemon is still draining a continuously-fed deferred-capture backlog. In that state the idle timer grew unconditionally and the FSM forced itself to SLEEP after 30 min even though drain threads were still writing to the store. Entering the SLEEP pipeline escalates to an EXCLUSIVE store lock, so this contended with the in-flight drain; and because crisis re-arming only runs in SLEEP, an oscillating/never-settling daemon could silently stop re-arming crisis detection. Fold two more activity signals into the idle countdown, alongside the wrapper heartbeat: - in-flight drain state: `capture.is_drain_in_progress()`, a thread-safe depth counter set by `drain_deferred_captures` / `drain_active_live_captures` for their whole duration; - recent real RPC traffic: `mcp_socket.last_activity_ts` (already used by the sleep-pipeline interrupt check, now also by the countdown). The decision is centralized in a pure, unit-testable helper `_idle_countdown_decision`. A genuinely idle daemon still advances to DROWSY/SLEEP exactly as before, so crisis re-arming keeps running; only an actively-working daemon is held awake. Explicit FORCE_SLEEP/user-sleep requests are unaffected. Add tests asserting the daemon does NOT advance toward SLEEP while a drain is in progress (or RPC is recent), that a truly idle daemon still sleeps, and that the in-progress flag is set across the production drain wrappers and released on exception. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Covers 2cffb35: a successful tick clears a stale last_tick_skipped_reason, plus the paused-skip event/persistence and the no-run_rem_cycle routing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Active sessions re-drain the entire transcript every turn, so the eager embed at the top of capture_turn re-ran the expensive GIL-bound Rust matmul for every already-stored turn just to discard it on the idem-tag dedup check -- a steady CPU drain proportional to transcript length. Defer embedding behind a memoized _compute_embedding() closure that runs at most once and only when a turn is actually new, and flush the record buffer before the idem lookup under _CAPTURE_DEDUP_LOCK so a just-inserted but unflushed turn is visible to the SQLite-backed dedup -- closing a check-then-insert race that produced live duplicate records. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…core Multiplicative boosts (trigram*2, FTS*3, valence) can push the internal score past 1.0, so a served recall hit could surface a "confidence" > 1, and a degraded daemon state surfaced flat 1.000 scores. Clamp the *displayed* score to [0,1] at serialization while ranking on a new, unclamped MemoryHit.sort_score so ordering is provably unchanged; the stale-downweight keeps sort_score in lock-step with score. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

(cherry picked from commit 38c632a)

…ich_club The essential-variable tracker and crisis recluster rebuilt their graph over ALL records -- including tombstoned (soft-deleted / deduped) ones -- so communities, centrality and rich_club were computed on thousands of dead nodes. On a real store rich_club sat at ~0.019, just under the 0.02 floor, re-arming crisis_mode every sleep cycle on a healthy store. Add a shared build_live_graph() helper (tombstone-filtered nodes, live-only edges) used by both paths, and demote rich_club_coefficient from a crisis *trigger* to a diagnostic-only signal (edge_density and community_count remain the triggers). Harden the runtime-graph tombstone guard to pd.isna() so a NaT/NA tombstoned_at on a reembedded datetime64 column is read as LIVE instead of collapsing the graph to empty. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…lder The daily-reflection step wrote an all-zero embedding placeholder that was never re-embedded, leaving reflections permanently unretrievable and feeding zero vectors into the scoring matmul. Embed literal_surface directly; on a native embedder failure fall back to a zero vector flagged embedding_pending=1 for deferred reembed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

compute_sigma runs an unbounded random-graph reference that can spin a core on a large graph. Add SIGMA_N_CEIL (default 20000, env IAI_MCP_SIGMA_N_FLOOR / IAI_MCP_SIGMA_N_CEIL) and return None above it instead of computing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The sleep-cycle staleness check only monitored attempt == 1, so a cycle that had already been retried (attempt >= 2) and was still wedged -- exactly the case the watchdog must catch -- was short-circuited and ignored. Gate on attempt < 1 instead so every genuine running attempt is monitored, excluding bool (isinstance(True, int) is True). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

e8f3deb stopped a transient count failure from parking the lifecycle tick but left the condition invisible (log.debug only). Emit a best-effort store_empty_check_failed warning event -- buffered and never raising, so it is safe even when the store connection is the thing failing -- so a sqlite left-in-error-state failure surfaces to the operator. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

reembed_pending_rows fed the raw stored literal_surface to embedder.embed(); on an encrypted store that is iai:enc:v1: ciphertext, so every embedding_pending=1 row re-embedded by this path got an embedding of the ciphertext (garbage). Decrypt via _decrypt_record_field first (no-op on a plaintext store); a decrypt failure leaves the row pending for retry rather than poisoning it with a garbage vector.

A daemon killed mid-SLEEP leaves lifecycle_state.json at current_state=SLEEP with sleep_cycle_progress=None (incoherent: a real in-flight cycle carries a progress dict). Resuming it wedged the daemon -- it never advanced the sleep pipeline, never reached the recluster that clears crisis, and recall stayed degraded (SLEEP + crisis both degrade recall). Normalize that one case to a clean WAKE at boot (dropping the stale crisis flag) via _normalize_boot_lifecycle_state; a real degeneration re-arms crisis on the next complete sleep cycle.

The final recall ranking sorted hits by score alone with a stable sort, so equal-scoring hits kept their arrival order. Two code paths that compute the same logical score via different float summation orders (notably an empty profile_state falling back to the medium scale) could therefore emit byte-different orderings, flaking test_empty_profile_state_falls_back_to_medium_scale on CI. Tie-break on str(record_id) — the same idiom already used elsewhere in this module — so equal-scoring hits resolve deterministically. Behaviour for distinctly-scored hits is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

CodeAbra

Thanks — this carries the consolidation/recall-correctness root causes for the crisis loop: tombstoned-record exclusion from the runtime graph, crisis topology on the live graph, score clamp, reflection-embedding + crash recovery. Reviewed the diff, security-clean, CI green. Merging with credit to you.

Marsu6996 and others added 20 commits June 21, 2026 17:02

test(daemon): guard tick-flag observability and the skip-reason reset

d03e024

Covers 2cffb35: a successful tick clears a stale last_tick_skipped_reason, plus the paused-skip event/persistence and the no-run_rem_cycle routing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

test(recall): guard displayed score clamp to [0,1] with order preserved

5bd3f47

(cherry picked from commit 38c632a)

Marsu6996 force-pushed the fix/consolidation-recall-correctness branch from 2631e96 to 642a8e8 Compare June 21, 2026 18:23

CodeAbra approved these changes Jun 22, 2026

View reviewed changes

CodeAbra merged commit 10401b3 into CodeAbra:main Jun 22, 2026
2 checks passed

CodeAbra mentioned this pull request Jun 22, 2026

Sleep daemon stuck in crisis_mode: cycle loops on a single step (never completes) while sleep-cycle --force works; served recall degrades to flat 1.000 / schema records #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): consolidation & recall correctness — live crisis graph, score clamp, reflection embedding, crash recovery#22

fix(memory): consolidation & recall correctness — live crisis graph, score clamp, reflection embedding, crash recovery#22
CodeAbra merged 21 commits into
CodeAbra:mainfrom
Marsu6996:fix/consolidation-recall-correctness

Marsu6996 commented Jun 21, 2026

Uh oh!

CodeAbra left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marsu6996 commented Jun 21, 2026

Summary

What changed (one logical change per commit)

Type of change

Affected areas

Testing

Benchmarks

Notes for reviewers

Uh oh!

CodeAbra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants