Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .lore.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,15 @@
<!-- lore:019e5ac1-fea2-71ff-b900-b50443b120ca -->
* **Always include recoverMissingObjects() coverage for new tables in migration reviews**: When reviewing PRs that add new DB tables via migrations, always verify \`recoverMissingObjects()\` in \`db.ts\` covers every new table with \`CREATE TABLE IF NOT EXISTS\`. If omitted: partial migration leaves tables missing permanently — subsequent runs see the version already set and skip the migration. Flag as CRITICAL. Also: \`SCHEMA\_VERSION\` constant (stuck at 16) is dead code — \`migrate()\` uses \`MIGRATIONS.length\`. Always \`git pull --rebase\` before starting migration work to avoid version number collisions.

<!-- lore:019e68c5-9ecd-759b-8ef6-3509c1d4a8db -->
* **Always prompt user to create a PR after pushing a branch**: After successfully pushing a branch to a remote, the assistant should prompt the user to create a pull request, typically by surfacing the PR creation URL. This pattern appears consistently across sessions: once a \`git push\` completes and the branch is tracking the remote, the assistant should proactively remind or prompt the user to open a PR (e.g., by sharing the GitHub PR creation link). Do not wait for the user to ask — surface the PR creation URL immediately after a successful push.

<!-- lore:019e2168-2fa4-77bd-a557-9d6dbcb40d81 -->
* **Prefer WASM backend over native onnxruntime-node for compiled binaries**: WASM backend for Bun \`--compile\` binaries with transformers.js: \`binaryExternalsPlugin\` in esbuild redirects \`onnxruntime-node\` → \`onnxruntime-web\` via \`onResolve\` (static imports only — does NOT redirect dynamic \`import()\` calls) and patches transformers.js CDN fallback via \`onLoad\` to read \`wasmPaths\` from \`globalThis.\_\_LORE\_VENDOR\_WASM\_PATHS\_\_\` (object form \`{ mjs, wasm }\` with exact hashed \`$bunfs\` filenames — directory strings fail). WASM files embedded as Bun \`{ type: 'file' }\` assets. For npm/CJS builds, \`onnxruntime-node\` stays external. WASM is ~2x faster on batches than native. Importing \`onnxruntime-web\` explicitly alongside the redirect creates two ort instances — 'cannot register backend cpu using priority 10' error.

<!-- lore:019e68cd-651a-7a1f-86b0-43d36249e915 -->
* **Prefers recall over assuming you don't have the information**: prefer recall over assuming you don't have the information.

<!-- lore:019e5b1a-0620-7016-b6e6-c5bdc392252e -->
* **Prefers update over create**: prefer update over create,

Expand Down
15 changes: 15 additions & 0 deletions packages/core/src/curator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -478,9 +478,24 @@ async function runInner(input: {
if (!responseText) return { created: 0, updated: 0, deleted: 0, entitiesCreated: 0, relationsCreated: 0 };

const response = parseResponse(responseText);

// Gate entry creation when at or above maxEntries to prevent the ratchet
// effect: curation creates entries → count exceeds limit → consolidation
// can't reduce (all unique) → curation creates more → count grows forever.
// When at the limit, only allow update/delete ops. Creates are allowed
// again once consolidation (or manual deletion) brings count below limit.
const currentEntries = ltm.forProject(input.projectPath, false);
const atLimit = currentEntries.length >= cfg.curator.maxEntries;
if (atLimit && response.ops.some((op) => op.op === "create")) {
log.info(
`curation: skipping creates (${currentEntries.length} entries >= maxEntries ${cfg.curator.maxEntries})`,
);
}

const result = applyOps(response.ops, {
projectPath: input.projectPath,
sessionID: input.sessionID,
skipCreate: atLimit,
detectedEntities: response.entities,
detectedRelations: response.relations,
});
Expand Down
13 changes: 13 additions & 0 deletions packages/core/src/gradient.ts
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,19 @@ export function setForceMinLayer(layer: SafetyLayer, sessionID?: string) {
}
}

/**
* Evict a single session's in-memory state. Called when a session has been
* idle long enough that keeping its caches resident is wasteful. All
* important state (gradient calibration, force-min-layer) is already
* persisted to SQLite and will be reloaded if the session resumes.
*
* Does NOT reset global calibration — only frees session-specific caches
* (prefix cache, raw window cache, distillation snapshot, etc.).
*/
export function evictSession(sessionID: string): void {
sessionStates.delete(sessionID);
}

// For testing only — reset all calibration and force-escalation state
export function resetCalibration(sessionID?: string) {
calibratedOverhead = null;
Expand Down
1 change: 1 addition & 0 deletions packages/core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ export {
getLastTurnAt,
consumeCameOutOfIdle,
saveGradientState,
evictSession,
// Test-only — exposed at the barrel so host-package tests can simulate idle
// gaps without sleeping. Not part of the public API.
setLastTurnAtForTest,
Expand Down
22 changes: 14 additions & 8 deletions packages/core/src/prompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -511,9 +511,9 @@ IMPORTANT:
*/
export const CONSOLIDATION_SYSTEM = `You are a long-term memory curator performing a consolidation pass. The knowledge base has grown too large and needs to be trimmed.

Your goal: reduce the entry count to the target maximum while preserving the most valuable knowledge.
Your goal: reduce the entry count to AT MOST the target maximum while preserving the most valuable knowledge. You MUST produce enough ops to reach the target — returning an empty array is not acceptable.

CONSOLIDATION RULES:
CONSOLIDATION STRATEGY (apply in order):
1. MERGE related entries — if multiple entries describe the same system, module, or concept
from different angles (e.g. several bug fixes in the same component), merge them into
ONE concise entry. Use an "update" op for the surviving entry and "delete" ops for the rest.
Expand All @@ -524,10 +524,15 @@ CONSOLIDATION RULES:
- Entries whose knowledge is fully subsumed by another entry
- Entries about one-off incidents with no recurring applicability
- General advice available in any documentation
4. PRESERVE:
- Entries describing non-obvious design decisions specific to this codebase
- Entries about recurring traps that a developer would hit again
- Entries that capture a hard-won gotcha with a concrete fix
4. FORCED EVICTION — if steps 1–3 are insufficient to reach the target, you MUST delete
the least valuable remaining entries until the count reaches the target. Rank entries by
recurring impact: entries about rare edge cases or narrow contexts are lower value than
entries about broadly applicable patterns or frequently encountered gotchas.

PRESERVE (highest priority — delete these last):
- Entries describing non-obvious design decisions specific to this codebase
- Entries about recurring traps that a developer would hit again
- Entries that capture a hard-won gotcha with a concrete fix

OUTPUT: A JSON array of "update" and "delete" ops only. No "create" ops — you are not
extracting new knowledge, only consolidating existing knowledge.
Expand All @@ -550,11 +555,12 @@ export function consolidationUser(input: {
const listed = input.entries
.map((e) => `- [${e.id}] (${e.category}) ${e.title}: ${e.content}`)
.join("\n");
return `Current knowledge entries (${count} total, target max: ${input.targetMax}):
const excess = count - input.targetMax;
return `Current knowledge entries (${count} total, target max: ${input.targetMax}, must remove at least ${excess}):

${listed}

Produce update/delete ops to reduce entry count to at most ${input.targetMax}. Prioritize merging related entries and trimming verbose ones over outright deletion.`;
Produce update/delete ops to reduce entry count to at most ${input.targetMax}. Prioritize merging related entries and trimming verbose ones, but if that is insufficient, delete the least valuable entries. You MUST remove at least ${excess} entries.`;
}

// Format distillations for injection into the message context.
Expand Down
159 changes: 143 additions & 16 deletions packages/gateway/src/idle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,16 @@ import {
saveGradientState,
getConsecutiveBusts,
effectiveMetaThreshold,
evictSession as evictGradientSession,
} from "@loreai/core";
import type { LLMClient } from "@loreai/core";
import type { GatewayConfig } from "./config";
import type { SessionState } from "./translate/types";
import { getWorkerModel } from "./worker-model";
import { getWorkerModel, getModelEntrySync } from "./worker-model";
import {
isCircuitBreakerTripped,
isWarmupAuthDisabled,
clearWarmupAuthDisabled,
resolveProfile,
blendedHistogramForSession,
shouldWarm,
Expand All @@ -47,12 +49,39 @@ import {
} from "./cache-warmer";
import * as Sentry from "@sentry/bun";
import { runBackground } from "./background-limiter";
import { isAuthStale, resolveAuth } from "./auth";
import { isAuthStale, resolveAuth, deleteSessionAuth, clearAuthStale } from "./auth";
import { emitWarmupMetric, emitSessionCostMetrics, emitCurationMetrics } from "./sentry";
import { getSessionCosts, totalWorkerCost } from "./cost-tracker";
import { getSessionCosts, totalWorkerCost, deleteSessionCosts } from "./cost-tracker";
import { deleteBillingPrefix } from "./cch";

const POLL_INTERVAL_MS = 30_000;

/**
* Cooldown tracking for knowledge consolidation.
*
* When consolidation runs but fails to reduce entries below maxEntries
* (e.g. all entries are genuinely unique), we record the attempt so the
* idle scheduler doesn't retry every 30s — which wastes Sonnet calls.
*
* Keyed by projectPath. Cleared when entry count changes (new curation
* creates/deletes entries) so consolidation retries with fresh data.
*/
const consolidationCooldown = new Map<
string,
{ attemptedAt: number; entryCount: number }
>();

/** 1 hour cooldown before retrying consolidation with the same entry count. */
const CONSOLIDATION_COOLDOWN_MS = 60 * 60 * 1000;

/**
* Evict sessions that have been idle for longer than this threshold.
* All important state is already persisted to SQLite — eviction only frees
* in-memory caches (prefix cache, distillation snapshot, recall store, etc.).
* If the session resumes, state is reloaded from DB on the next request.
*/
const SESSION_EVICTION_MS = 60 * 60 * 1000; // 1 hour

// ---------------------------------------------------------------------------
// startIdleScheduler
// ---------------------------------------------------------------------------
Expand All @@ -70,6 +99,8 @@ export function startIdleScheduler(
config: GatewayConfig,
sessions: Map<string, SessionState>,
doIdleWork: (sessionID: string, state: SessionState) => Promise<void>,
/** Optional callback to clean up pipeline-level satellite Maps when a session is evicted. */
onEvict?: (sessionID: string) => void,
): () => void {
const inProgress = new Set<string>();
const warmupInProgress = new Set<string>();
Expand Down Expand Up @@ -103,6 +134,66 @@ export function startIdleScheduler(
.finally(() => inProgress.delete(sessionID));
}

// --- Session eviction — free memory for sessions idle > 1 hour ---
// All important state is persisted to SQLite; eviction only releases
// in-memory caches (gradient state, prefix cache, recall store, LTM
// caches, cost tracking, auth credentials). If a session resumes,
// getOrCreateSession() in pipeline.ts reloads persisted state from DB.
for (const [sessionID, state] of sessions) {
if (inProgress.has(sessionID)) continue; // don't evict during active idle work
if (warmupInProgress.has(sessionID)) continue;
if (now - state.lastRequestTime < SESSION_EVICTION_MS) continue;
// Don't evict sessions still executing tools — they're active
if (state.lastStopReason === "tool_use") continue;

log.info(`evicting idle session ${sessionID.slice(0, 16)} (idle ${Math.round((now - state.lastRequestTime) / 60_000)}m)`);

// Persist final cost snapshot before eviction
try {
const costs = getSessionCosts(sessionID);
if (costs && costs.conversation.turns > 0) {
saveSessionCosts(sessionID, {
conversationCost: costs.conversation.cost,
workerCost: totalWorkerCost(costs),
conversationTurns: costs.conversation.turns,
cacheReadTokens: costs.conversation.cacheReadTokens,
cacheWriteTokens: costs.conversation.cacheWriteTokens,
warmupSavings: costs.counterfactual.warmupSavings,
warmupHits: costs.counterfactual.warmupHits,
ttlSavings: costs.counterfactual.ttlSavings,
ttlHits: costs.counterfactual.ttlHits,
batchSavings: costs.batchSavings,
avoidedCompactions: costs.counterfactual.avoidedCompactions,
avoidedCompactionCost: costs.counterfactual.avoidedCompactionCost,
});
}
} catch (e) {
log.warn(`session eviction: cost persistence failed for ${sessionID.slice(0, 16)}:`, e);
}

// Persist gradient state before eviction
try {
saveGradientState(sessionID);
} catch (e) {
log.warn(`session eviction: gradient persistence failed for ${sessionID.slice(0, 16)}:`, e);
}

// Clean up all per-session in-memory state across modules
evictGradientSession(sessionID);
curator.resetCurationTracker(sessionID);
deleteSessionCosts(sessionID);
deleteSessionAuth(sessionID);
clearAuthStale(sessionID);
deleteBillingPrefix(sessionID);
clearWarmupAuthDisabled(sessionID);

// Clean up pipeline-level satellite Maps via callback
onEvict?.(sessionID);

// Remove from the main sessions map last
sessions.delete(sessionID);
}

// --- Cache warming (separate from idle work — fires before TTL expiry) ---
if (isCircuitBreakerTripped()) return;

Expand Down Expand Up @@ -269,10 +360,15 @@ export function buildIdleWorkHandler(
log.error("idle distillation error:", e);
}

// 2. Curation
// 2. Curation — cost-aware frequency: on expensive worker models, curate
// less often (same multiplier as the inline path in pipeline.ts).
if (cfg.knowledge.enabled && cfg.curator.onIdle) {
try {
if (state.turnsSinceCuration >= cfg.curator.afterTurns) {
const workerModelID = model?.modelID ?? "unknown";
const modelInputCost = getModelEntrySync(workerModelID).cost?.input ?? 3;
const curationMultiplier = modelInputCost >= 5 ? 3 : modelInputCost >= 1 ? 2 : 1;
const effectiveAfterTurns = cfg.curator.afterTurns * curationMultiplier;
if (state.turnsSinceCuration >= effectiveAfterTurns) {
const result = await Sentry.startSpan(
{ name: "lore.curator", op: "lore.curation", attributes: { trigger: "idle" } },
() => curator.run({ llm, projectPath, sessionID, model }),
Expand All @@ -284,28 +380,59 @@ export function buildIdleWorkHandler(
`idle curation: ${result.created} created, ${result.updated} updated, ${result.deleted} deleted`,
);
emitCurationMetrics({ ...result, trigger: "idle" });
// Entry count changed — clear consolidation cooldown so it
// retries with fresh data on the next idle tick.
consolidationCooldown.delete(projectPath);
}
}
} catch (e) {
log.error("idle curation error:", e);
}
}

// 3. Consolidation — runs after curation so new entries are counted
// 3. Consolidation — runs after curation so new entries are counted.
// Cooldown: skip if we already attempted consolidation for this project
// with the same entry count within the last hour — avoids wasting
// Sonnet calls when the LLM correctly concludes all entries are unique.
if (cfg.knowledge.enabled) {
try {
const entries = ltm.forProject(projectPath, false);
if (entries.length > cfg.curator.maxEntries) {
log.info(
`entry count ${entries.length} exceeds maxEntries ${cfg.curator.maxEntries} — running consolidation`,
);
const result = await Sentry.startSpan(
{ name: "lore.consolidation", op: "lore.curation", attributes: { trigger: "consolidation" } },
() => curator.consolidate({ llm, projectPath, sessionID, model }),
);
if (result.updated > 0 || result.deleted > 0) {
log.info(`consolidation: ${result.updated} updated, ${result.deleted} deleted`);
emitCurationMetrics({ created: 0, ...result, trigger: "consolidation" });
const cooldown = consolidationCooldown.get(projectPath);
const now = Date.now();
if (
cooldown &&
cooldown.entryCount === entries.length &&
now - cooldown.attemptedAt < CONSOLIDATION_COOLDOWN_MS
) {
// Cooldown active — skip to avoid wasting Sonnet calls on
// repeated consolidation attempts that produce no changes.
} else {
log.info(
`entry count ${entries.length} exceeds maxEntries ${cfg.curator.maxEntries} — running consolidation`,
);
const beforeCount = entries.length;
const result = await Sentry.startSpan(
{ name: "lore.consolidation", op: "lore.curation", attributes: { trigger: "consolidation" } },
() => curator.consolidate({ llm, projectPath, sessionID, model }),
);
if (result.updated > 0 || result.deleted > 0) {
log.info(`consolidation: ${result.updated} updated, ${result.deleted} deleted`);
emitCurationMetrics({ created: 0, ...result, trigger: "consolidation" });
// Consolidation made progress — clear cooldown so it can retry
consolidationCooldown.delete(projectPath);
} else {
// Consolidation produced no changes — enter cooldown to prevent
// retry storm (the LLM thinks all entries are unique).
consolidationCooldown.set(projectPath, {
attemptedAt: Date.now(),
entryCount: beforeCount,
});
log.info(
`consolidation produced no changes — cooldown active for 1h ` +
`(${beforeCount} entries in ${projectPath})`,
);
}
}
}
} catch (e) {
Expand Down
13 changes: 12 additions & 1 deletion packages/gateway/src/pipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -778,7 +778,18 @@ async function initIfNeeded(projectPath: string, config: GatewayConfig, gitRemot
if (config && !stopIdleScheduler) {
const llm = getLLMClient(config);
const idleHandler = buildIdleWorkHandler(llm);
stopIdleScheduler = startIdleScheduler(config, sessions, idleHandler);
stopIdleScheduler = startIdleScheduler(config, sessions, idleHandler, (sessionID) => {
// Clean up pipeline-level satellite Maps on session eviction.
// The headerSessionIndex entries are keyed by header values pointing
// TO this sessionID — remove them too.
for (const [key, sid] of headerSessionIndex) {
if (sid === sessionID) headerSessionIndex.delete(key);
}
ltmSessionCache.delete(sessionID);
ltmPinnedText.delete(sessionID);
stableLtmCache.delete(sessionID);
cwdWarned.delete(sessionID);
});
}

log.info(`gateway pipeline initialized: ${projectPath}`);
Expand Down
Loading
Loading