All
src/paths are relative topackages/plugin/(the published npm package). File locations live inSTRUCTURE.md; this document explains how the pieces fit and — above all — the invariants that keep the Anthropic prompt cache stable. When in doubt about transform behavior, read "Transform pass mechanics" below before touching code.
Magic Context is an @opencode-ai/plugin (entry src/index.ts) that rewrites the message array and system prompt on every LLM call to keep a long session inside the context window without losing history. Core tenets:
- Thin adapters, real logic separated. OpenCode-facing handlers live in
src/plugin/; feature logic insrc/hooks/magic-context/(runtime),src/features/magic-context/(services),src/tools/(agent tools). - Durable SQLite state, never ephemeral — if storage is unavailable the plugin fails closed rather than silently letting the prompt grow past the provider limit. DB at
~/.local/share/cortexkit/magic-context/context.db, shared cross-harness (OpenCode + Pi); session-scoped tables carry aharnessdiscriminator, project-scoped tables (memories, git commits) are shared. - Replay-everything for cache stability. Every persistent message mutation (reasoning clearing, structural-noise / placeholder / image / merged-assistant stripping, caveman compression, synthetic-todowrite, drop placeholders) is re-applied deterministically on EVERY transform pass — including defer passes — so the wire bytes stay byte-identical and the provider prompt cache survives.
- Hidden subagents (
historian,historian-editor,dreamer,sidekick) do the heavy LLM work out of band; the transform itself does no LLM calls. - Runtime SQLite backend (
src/shared/sqlite.ts):bun:sqliteunder Bun,node:sqlite(DatabaseSync) under Node (Pi) and Electron (Desktop). The non-Bun branch adds a savepoint-awaretransaction()shim andreadonly→readOnlymapping; otherwise identical. No native module, no prebuild. - Pi parity:
packages/pi-plugin/mirrors OpenCode semantics, importing shared core from@magic-context/core. Intentional divergences are tracked inpackages/pi-plugin/PARITY.md.
- Bootstrap (
src/index.ts): load config, register hidden agents + hooks + tools, start RPC server, dream-timer, auto-update checker; detect conflicting plugins (DCP / OMO / OpenCode auto-compaction) and disable the runtime if any is active. - Adapters (
src/plugin/): hook wrappers, tool registry, RPC handlers, dream-timer lifecycle, per-session hook construction. - Runtime (
src/hooks/magic-context/): the transform pipeline, postprocess phase, event/command handlers, system-prompt injection, compartment runners, decay rendering, strip-and-replay, nudges, m[0]/m[1] injection. - Feature services (
src/features/magic-context/): storage, scheduler, tagger, memory, dreamer, sidekick, key-files, git-commit + message FTS indexes, unified search, overflow detection, migrations. - Tools (
src/tools/):ctx_reduce,ctx_expand,ctx_note,ctx_memory,ctx_search. - Config + shared (
src/config/,src/shared/): Zod config (deep-merge raw JSONC before validation; invalid leaves fall back to defaults with warnings, never disable the plugin), logger, data paths, SQLite selector, harness id, RPC transport, conflict detector, tag-transcript primitive (shared with Pi). - TUI (
src/tui/): sidebar +/ctx-status//ctx-recompdialogs, RPC-backed; shipped as raw TS via the./tuiexport (not bundled intodist/index.js). - CLI (
packages/cli/, separate@cortexkit/magic-contextpackage):npxsetup / doctor / migrate wizard.
This is the heart of the system and the part most easily gotten wrong. A "transform pass" is one invocation of experimental.chat.messages.transform (src/hooks/magic-context/transform.ts), wrapped defensively in src/plugin/messages-transform.ts (transient SQLITE_BUSY → return messages unmodified so the prompt loop always proceeds). OpenCode fires it once per LLM round-trip (per step within a turn).
- Resolve usage + scheduler decision (
executevsdefer). - Emergency overflow recovery if ≥95%.
- Compartment trigger check (off the in-memory
args.messagestail — noopencode.dbread steady-state); fire the historian async if eligible. - Prepare compartment injection (decide m[0]/m[1] materialization).
- Tag messages; replay dropped-status, caveman, reasoning, placeholder, image strips.
- Compartment phase: inject the
<session-history>(m[0]/m[1]) intomessage[0]. - Postprocess (
transform-postprocess-phase.ts): the mutation gates — pending-op drain, heuristic cleanup, nudges, synthetic-todowrite, auto-search.
- SOFT+ (defer /
cache_hit): nothing new. m[0] AND m[1] replay byte-identical; the entiresystem + m[0] + m[1]prefix stays cached. Only the conversation tail moves (wherectx_reduce/age drops land, themselves replayed deterministically). The steady state — most passes are this. - SOFT (cache-busting): m[1] re-renders (new compartments / memories / user-profile surface as deltas) while m[0] stays byte-identical.
system + m[0]stays cached; the cache busts at the m[1] breakpoint. Driven by an execute pass,/ctx-flush, or a deferred-history drain. - HARD (m[0] fold):
mustMaterializefires → m[0] re-materializes, folding m[1] into the new decayed baseline and resetting m[1] to a placeholder. The whole prefix rebuilds — but "for free" because the provider cache key was already dead (see HARD triggers in "m[0]/m[1] cache layout"). Decay re-tiering happens ONLY on a HARD fold — a SOFT pass must never re-tier (that would change m[0] bytes).
Pending-op drain and heuristic cleanup are each gated by the same shape in transform-postprocess-phase.ts:
shouldApplyPendingOps / shouldRunHeuristics =
(execute || materializationRequested || forceMaterialization || m0HardFoldThisPass) // BUST clause: is this pass busting anyway?
&& (!compartmentRunning || emergencyBypassCompartmentGate) // VETO clause: is the historian mid-run?
- BUST clause — only mutate (drop tools, run heuristics) on a pass that is already busting the prefix, so the mutation rides that one bust instead of causing its own.
m0HardFoldThisPassis the fold-exec signal (an advisorymustMaterializecall earlier in postprocess). - VETO clause —
compartmentRunning— block mutation while the historian is summarizing the tail, so we don't change the bytes it's reading mid-run. Bypassed byemergencyBypassCompartmentGate. emergencyBypassCompartmentGatebypasses the veto whenforceMaterialization(≥85%) ORm0HardFoldThisPass— i.e. a hard fold drains pending ops + runs heuristics even while the historian runs, because the prefix is busting regardless (see "drain into the known bust" invariant). This is safe per the disjoint-DB model below; Pi already does this (context-handler.ts).
- A HARD bust means the prefix is already gone → drain EVERYTHING into it. Never "defer" a hard bust. This pass IS the fold; there is no later fold to wait for. Deferring the drain only produces a second, avoidable bust ~one turn later. (The
compartmentRunningveto must therefore yield to a hard fold — the fold-exec bypass.) - A defer (SOFT+) pass must replay byte-identical. Any first-application of a strip/drop on a defer pass changes tail bytes and busts the whole prefix after it. Watermark-gated strips (placeholders, images, stale-
ctx_reduce) use a frozen-id replay pattern: detect-and-freeze the affected ids only on cache-busting passes, replay the frozen set on every pass. There is exactly ONE drop placeholder string,[dropped §N§], a pure function of tag id — never re-derive bytes from mutated content (that caused repeated cache catastrophes). - Deferred work rides the next bust cycle; it never forces its own. Historian publishes, compaction-marker moves, and queued drops accumulate while m[1] replays frozen, and materialize together on the next genuine bust (execute / hard fold / flush). A historian publish does NOT bust the cache — between busts every pass is
cache_hit. - Boundary execution defers mid-turn.
executedecisions becomedeferwhile the latest assistant turn is mid-tool-use (CAS-flagdeferred_execute_state), so we don't rewrite bytes while a turn is still accumulating tool calls. Bypasses: ≥85% force, explicit-bust, subagent. Drained re-peek-and-clear at end of postprocess.
Mutating while the historian runs is safe because the two databases are disjoint on the read/write side:
- The historian reads raw OpenCode messages from
opencode.db(read-only) for its chunk. - Drops + heuristics mutate
context.db(tags/pending_ops) and the in-memory outgoing wire only. - The historian's in-flight snapshot is validated by
computeRawRangeFingerprint, which hashes raw content only (ids, part types, content lengths) — never tag/drop state — so a concurrent drop can't invalidate it. - Its post-publish
queueDropsForCompartmentalizedMessagesis idempotent against already-dropped tags.
The compacted history renders into TWO synthetic user-role message slots at the head, so the large stable prefix survives steady-state work. inject-compartments.ts (renderM0 / renderM1 / materializeM0 / mustMaterialize), mirrored in inject-compartments-pi.ts. Both slots prepend with synthetic: true parts so they don't count toward OpenCode's title-generation gate.
- m[0] — cumulative baseline (frozen, like
system[0]). Holds<project-docs>(rootARCHITECTURE.md+STRUCTURE.md), baseline<user-profile>, and the decay-rendered compartment history as of the last materialization. Does NOT change on routine turns. - m[1] — volatile delta. Holds everything added since the last m[0] materialization:
<key-files>, new user-profile additions, new memories (via themaxMemoryIdwatermark),<memory-updates>supersede deltas, and the newest compartments at full tier. Renders a minimal placeholder when empty (never fully empty — Anthropic cache-breakpoint structure).
mustMaterialize (HARD fold) triggers — organized around the bust taxonomy so the trigger list and the m[0]/m[1] contract can never silently disagree:
- Provider-side cache eviction (the cache is already dead, so folding is free): model/provider change (
cachedM0ModelKey), system-prompt-hash change (cachedM0SystemHash), idle > TTL (cacheExpired, self-consuming vialastResponseTime > cachedM0MaterializedAt). - Genuine m[0] content change (baseline bytes differ): first render,
cached_m1_missing,project_memory_epochchange (dashboard / external mutation), pending m[0] mutations (max_mutation_id— structural compartment delete/merge/recomp), upgrade-state change. - Deliberately NOT triggers (these are m[1] deltas — triggering would bust m[0] on routine background work and defeat the design): new compartment sequence,
project_user_profile_version,maxMemoryId, project-docs-hash change (docs edits fold in on the next natural hard bust, never on their own), and tool-set-hash change (process-global, false positives). - Pressure backstop refold: on a cache-busting pass, if no natural HARD bust has arrived but m[1] has grown large — gated by the m[1]/m[0] size ratio (with a small-m[0] floor) OR an absolute m[1] token cap (~20% of history budget) OR a large memory-mutation count.
applyMarkersToStateupdates ALLstate.cachedM0*fields post-materialize (guards against an infinite re-materialize loop)./ctx-flushis SOFT (drives m[1] refresh + heuristics, not an m[0] fold).
Memory mutations route through m[1], not the epoch. In-session ctx_memory mutations do NOT bump project_memory_epoch: additive writes surface via the maxMemoryId watermark; non-additive (update/archive/merge) record a memory_mutation_log row rendered as a <memory-updates> delta. Both reconcile into m[0] on the next natural hard bust. The epoch is bumped only by dashboard mutations and /ctx-session-upgrade migration (an external editor can't otherwise signal a running session).
The long-history pipeline. Tiered compartments + deterministic decay renderer (replaced the v1 flat-compartment + LLM-compressor model).
- Trigger (
compartment-trigger.ts): threshold-relative pressure (context_limit × execute_threshold × 5%, clamped 5k–50k), commit clusters, and TC-chunked unsummarized-tail size (≥ triggerBudget × 3), while protecting the live tail. Runs off the in-memory tail (zeroopencode.dbreads steady-state); hands the resolved boundary snapshot to the runner so the historian sees exactly what the fire decision saw. - Produce (
compartment-runner-incremental.ts): runs the historian subagent on the raw chunk above the last compartment boundary with a bounded prompt (no full state dump) — 4 rotating seed compartments + the last 6 persisted compartments + the project-memory block for fact dedup. Emits each compartment with 4 paraphrase tiers (p1verbose →p4anchor-only), animportance(decay-rate semantics), anepisode_type, a<facts>block in the 5-category taxonomy, and an<events>block. - Parse + validate:
parseCompartmentOutput+validateHistorianOutput(contiguous, non-overlapping ranges, correctunprocessed_from). - Discard-last boundary healing: if the historian consumed to the chunk edge with weak lookahead, the last (lookahead-free) compartment is not persisted; the next run re-reads it at the head with full lookahead. Guarded by progress (
k≥2) and emergency-disabled. - Store: publish transaction appends compartments with tier columns. Promotable facts promote to project memory (exact-dedup);
user_observationsstored only whendreamer.user_memories.enabled(privacy gate). Events →compartment_events. Compartment-chunk embeddings generated on publish (memory-gated). Publish defers a compaction-marker move (see Subsystems) and signals a deferred history refresh — it does NOT force a bust. - Render (decay):
decay-render.ts(shared OpenCode + Pi) picks one tier per compartment viadecay-curve.ts: half-lifeH = H50·2^((I−50)/D)/max(p,0.10)(H50=24,D=25), log-cost tier boundaries[0.201,0.729,1.322,2.587], budget pressureponce per pass. Older / lower-importance / higher-pressure compartments demote oldest-first; past the archive boundary they render P4/self-close or drop. Self-tunes as the context window changes — no LLM call. Legacy (pre-v2) rows render P3 (if they carry aU:line) else P4. - Recomp / upgrade:
/ctx-recomprebuilds compartment structure from raw history (emits NO facts — preserves curated memories)./ctx-session-upgraderuns full recomp + a once-per-project 9→5-category memory migration (activeonly,permanentuntouched, bumps the epoch).
protected-tail-boundary.ts decides, per pass, which prefix of the raw tail is eligible for the historian and which suffix stays protected — from true-raw token sizes (not user-turn counts), so sparse-user-turn sessions can't deadlock the historian (#132). Boundary anchors at lastCompartmentEnd + 1; token target N capped at 0.40 × usable (ABS_CAP 96k); a live-prompt floor keeps it from crossing the newest meaningful user message on routine (<80%) passes. Open tool arcs (a tool invocation with no result in the window) only hold the boundary back when recent (≥ the size-walk start = the live window); a stale/interrupted open arc older than that is compactable — otherwise one dead running tool call at the eligible-head edge would freeze the historian indefinitely. The trigger/runner share a content-stable range fingerprint for cross-view staleness validation.
- Memories (
memory/storage-memory.ts): project-scoped durable knowledge in the 5-category taxonomy (PROJECT_RULES / ARCHITECTURE / CONSTRAINTS / CONFIG_VALUES / NAMING), with FTS + vector side tables.ctx_memoryexposes write/archive/update/merge/list;listis dreamer-only; primary agents may only mutate their own project's memories (workspace-shared categories aside). - Unified search (
search.ts): one query embedding dispatched across memories, raw message history (FTS viamessage-index.ts), indexed git commits, and compartment-chunk embeddings. Hard-filters memories already visible in<session-history>and raw-message hits newer than the last compartment boundary (already in context). - Embeddings: vectors stored as plain SQLite BLOBs, scanned in-memory via
Float32Arraycosine (sqlite-vec rejected —bun:sqlitecan't load extensions + write-amplification). Provider resolved per-project; a substitution guard rejects a served model that doesn't match the requested one. Compartment-chunk embedding is on-demand via/ctx-embed(auto-drains the active session once per process; resilient retry with circuit-break; chunk-window config folds into chunk identity so it doesn't invalidate memory/commit vectors). - Workspaces (
workspaces/workspace_members): a project belongs to at most one workspace; member sessions read the union of members' memories (repo-attributed), gated per-category byshare_categories. Acached_m0_workspace_fingerprint(sorted identity+epoch+categories hash) detects membership/policy changes for a single hard fold on change.
Background maintenance, one worker at a time. Eligible projects detected on message.updated (12h cooldown), enqueued by a process-wide 15-min timer, serialized by a DB lease (TTL + renewal). Each task spawns one child session (dreamer/runner.ts); model resolved through the fallback chain. Tasks: consolidate, verify, archive-stale, improve, maintain-docs, plus optional user-memory review / smart-note evaluation / key-file identification (config-gated). A circuit breaker aborts after 3 consecutive identical failures. Queue rows are project-scoped (a host only dequeues projects it has loaded).
- Synthetic-todowrite:
tool.execute.aftercaptures todo state tolast_todo_state(pure DB write). On a cache-busting pass, postprocess injects a synthetictool_use/tool_resultpair (call_id =mc_synthetic_todo_<sha256(state)[:16]}) into the latest assistant message, AFTER tagging so it's never dropped. Defer passes rebuild from the persistedstate_jsonfor byte-identity. - ctx_reduce nudges: Channel 1 appends a
<system-reminder>to tool outputs intool.execute.after(persisted to OpenCode's DB → replays for free). Channel 2 delivers a one-shot synthetic-user ceiling nudge at step boundaries via the live-server client (promptAsyncwithsynthetic: true). Both gate onctx_reduceactually being in the session's tool allow-list. Trigger math: severity over the working window,reclaimable ≥ usable/3, protected tags excluded. - Tiered emergency drop (≥85%): target-headroom eviction down to
fixedFloor + 0.30 × (ceiling − fixedFloor), tools oldest-first across tiers (T3 misc → T2 edit/search → T1 navigation), newest-20% recency reserve on T1/T2.floorTags(full active set, for floor accounting) vstags(droppable candidates).last_emergency_input_sampleis the idempotence latch. The newest-20 dropped tool calls keep a[dropped §N§]skeleton (thetool_usesurvives, output replaced) so provider tool-pairing holds; older drops are fully removed. - Compaction markers: inject an OpenCode-compatible compaction boundary so
filterCompactedstops at the historian's last compartment, shrinking the transform-input array. The marker move is deferred from historian publish into the next materializing pass (one bust covers both the<session-history>rebuild and the boundary advance); CAS-guarded, restart-safe. - Content stripping (
strip-content.ts,caveman.ts,sentinel.ts): stateless strip functions + deterministic in-place sentinel replacement + persisted watermarks. Provider-aware: empty-content sentinels only stay empty for providers that accept them (modelAcceptsEmptyContent); others get a[dropped]placeholder (e.g. Copilot/Bedrock break tool adjacency on empty parts — #135). - Message / git-commit indexes: FTS5 raw-message index maintained outside the search hot path (async reconciliation + live
message.updatedevents); HEAD-only non-merge git-commit corpus populated by the dream timer. - System-prompt injection (
system-prompt-hash.ts): injects only the Magic Context guidance text + a frozenToday's date:line (per-session sticky, updated only on cache-busting passes). Adjunct blocks (<project-docs>/<user-profile>/<key-files>) are NOT here — they moved into m[0]/m[1] so the system prompt stays maximally cache-stable. Skipped entirely for OpenCode's internaltitle/summary/compactionagents and for hidden child sessions (detected by themagic-context-title prefix). - TUI ↔ server RPC: localhost server on an ephemeral port (published to
session_meta); the TUI plugin reads all data via RPC (no direct SQLite, avoids lock contention).
storage-db.ts creates the schema and runs versioned migrations (migrations.ts, currently v1–v36). LATEST_SUPPORTED_VERSION is a schema fence — it MUST be bumped with every new migration (a unit test asserts it equals the highest migration), and a stale value makes the DB refuse to open after the migration applies. ensureColumn() + healAllNullColumns() backfill upgraded DBs even if a migration row is lost. New session-scoped tables must be added to clearSession(). A bulletproof MAGIC_CONTEXT_TEST_DATA_DIR guard keeps the test suite off the live DB (running bun test once migrated a live DB and fail-closed running binaries). SQLite binds must use SPREAD positional args, never the array form (bun:sqlite binds a lone array positionally; node:sqlite reads it as named params and throws).
Three effective modes; the heavier features (historian, nudges, adjunct injection) are gated, while tag/drop plumbing stays on everywhere.
| Feature | Primary + ctx_reduce_enabled: true |
Primary + ctx_reduce_enabled: false |
Subagents |
|---|---|---|---|
| Tag DB records | ✓ | ✓ | ✓ |
§N§ prefix injection + ctx_reduce tool |
✓ | ✗ | ✓ (if ctx_reduce available) |
| Historian / compartments / decay / m[0]m[1] | ✓ | ✓ | ✗ |
| Channel 1 nudge | ✓ | ✗ | ✓ |
| Channel 2 nudge | ✓ | ✗ | ✓ |
| Synthetic-todowrite / auto-search | ✓ | ✓ | ✗ |
| Heuristic tool drops at execute | ✓ once/turn | ✓ once/turn | ✓ every execute pass |
| 85% force-materialize / 95% block | ✓ | ✓ | ✗ (overflow path only) |
| Caveman text compression | ✗ | opt-in | ✗ |
Subagents run heuristic drops on every execute pass (no once-per-turn guard) because a long subagent run is effectively one parent turn and would otherwise starve; they have no provider-cache reuse to protect.
Fail closed when storage is unavailable (better to disable than silently overflow the prompt). Fail open in per-turn handlers (log and skip). Wrap the outer transform so transient SQLITE_BUSY/SQLITE_LOCKED never crash the prompt loop (#23). overflow-detection.ts parses provider context-overflow errors (Anthropic / OpenAI / Copilot) and persists the detected limit so later passes use the lower value. Subagent model fallback (model-suggestion-retry.ts) iterates the chain on retryable failures; abort/timeout/context-overflow short-circuit. Hidden agents carry a steps/maxSteps cap and are aborted via session.abort on timeout so a weak local model can't loop forever (#154).
Each tags row is one taggable source-content unit (message, file, or tool). message/file tags key on (session_id, message_id) (synthetic content id). tool tags key on a COMPOSITE (session_id, callID, tool_owner_message_id) — because OpenCode reuses a callID counter per assistant turn, so the same read:32 recurs across turns; including the owning assistant message id gives each invocation its own row (migration v10). Owner derivation: invocation parts own themselves; result parts pop a FIFO of unpaired invocations; a result whose invocation was compacted away falls back to the nearest prior persisted owner. The same composite keying mirrors in the drop queue and heuristic cleanup so dropped keys match what the tagger persisted. Per-tag token counts (token_count / input_token_count / reasoning_token_count) are computed once on tag insert and summed for sidebar / boundary / nudge math (off the hot path).