Skip to content

fix: consolidation retry storm, idle curation frequency, and session memory leak#473

Merged
BYK merged 1 commit into
mainfrom
fix/consolidation-loop-memory-leak
May 27, 2026
Merged

fix: consolidation retry storm, idle curation frequency, and session memory leak#473
BYK merged 1 commit into
mainfrom
fix/consolidation-loop-memory-leak

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 27, 2026

Summary

Fixes three user-reported issues: excessive Sonnet API usage, 5GB RAM consumption, and knowledge consolidation stuck in an infinite retry loop. All three are interconnected — the consolidation loop was a major driver of the Sonnet overhead.

Changes

Consolidation retry storm (Issue 3 + Issue 1)

  • Consolidation cooldown (idle.ts): tracks per-project {attemptedAt, entryCount}. When consolidation runs but produces no changes (LLM correctly concludes all entries are unique), enters a 1-hour cooldown. Cooldown clears when entry count changes (curation adds/removes entries). Previously retried every 30-60s indefinitely — 15-30 wasted Sonnet calls per 30-minute idle period.
  • Stronger consolidation prompt (prompt.ts): added a "FORCED EVICTION" step — when merging/trimming isn't enough, the LLM MUST delete least-valuable entries to reach the target. The user prompt now states "must remove at least N entries."
  • Curation creation gate (curator.ts): when entry count is at or above maxEntries, curation runs with skipCreate: true, preventing the ratchet effect where entries grow monotonically.

Excessive Sonnet API usage (Issue 1)

  • Cost-aware idle curation (idle.ts): the idle path was using raw afterTurns=3 while the inline path uses afterTurns * curationMultiplier (=6 for Sonnet, =9 for Opus). Idle curation was firing 2x more often than intended for Sonnet-class models.

Session memory leak (Issue 2)

  • Session eviction (idle.ts, pipeline.ts, gradient.ts, index.ts): sessions idle > 1 hour are evicted from all in-memory Maps. Persists final cost/gradient state to SQLite before cleanup. Cleans up: gradient state, curation tracker, cost tracking, auth, billing prefix, warmup auth, and pipeline satellite Maps (headerSessionIndex, ltmSessionCache, ltmPinnedText, stableLtmCache, cwdWarned). New evictSession() exported from core for clean single-session gradient cleanup.

…memory leak

- Add 1-hour cooldown for failed consolidation attempts (stops wasting
  Sonnet calls when LLM correctly concludes all entries are unique)
- Apply cost-aware curation multiplier in idle path (was using raw
  afterTurns=3 instead of afterTurns*2=6 for Sonnet — 2x too frequent)
- Strengthen consolidation prompt with forced-eviction fallback (LLM
  must now reduce to target count, not just try)
- Gate curation entry creation at maxEntries limit (prevents ratchet
  effect where entries grow monotonically)
- Add session eviction after 1 hour idle (frees gradient state, recall
  store, LTM caches, cost tracking, auth — all persisted to SQLite)
- Export evictSession() from core for clean single-session cleanup
@BYK BYK self-assigned this May 27, 2026
@BYK BYK merged commit 48cc25b into main May 27, 2026
7 checks passed
@BYK BYK deleted the fix/consolidation-loop-memory-leak branch May 27, 2026 10:45
BYK added a commit that referenced this pull request May 27, 2026
## Summary

Follow-up to #473. Onur's logs revealed **1143 knowledge entries** — far
beyond what the single-pass consolidation can handle. The previous
consolidation sent all entries in one prompt, but with 1143 entries
(~343K tokens of input) this overflows the context window, and the 4096
output token budget can only express ~80-100 delete ops (vs the ~1118
needed).

## Context from Onur's logs

```
entry count 1143 exceeds maxEntries 25 — running consolidation
entry count 1143 exceeds maxEntries 25 — running consolidation
entry count 1143 exceeds maxEntries 25 — running consolidation
...repeating every ~60s...

cost-tracker: worker overhead=$1140.8017 (distillation-only=$2.5246)
```

The retry storm (#473) burned $1,138 in consolidation calls that could
never succeed due to the token budget constraint.

## Changes

Adds batched consolidation mode in `curator.ts`:

- When entries ≤ 50: unchanged — sends all entries in a single prompt
- When entries > 50 (batched mode): takes the **lowest-confidence**
entries (tail of the confidence-sorted list from `forProject()`) as
candidates for deletion. Each pass targets removing ~25 entries (half
the batch).
- The idle scheduler's cooldown (from #473) clears when entry count
changes, automatically triggering the next batch on the following idle
tick.
- Converges to `maxEntries` over multiple passes: 1143 → 1118 → 1093 →
... → 25

For Onur's case: ~45 passes × 1 Sonnet call each ≈ $2-3 total to clean
up 1143 entries, spread across idle periods. vs the previous behavior of
infinite retries that never made progress.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant