Memory

Osaurus has a persistent, on-device memory system that learns from your conversations and surfaces relevant context only when it actually helps. Memory runs in the background, stores very little, and injects ~800 tokens (or zero) per turn instead of the firehose-style stuffing that v1 did.

The mental model is simple: a smart secretary that knows what you've discussed and surfaces only what matters right now — not a tape recorder.

Not the same as per-agent DB. Memory is global across all your chats — it's identity, pinned facts, and episodic recall derived from natural conversation. For an agent that maintains its own private structured store (custom tables, soft-deletes, saved views), see Agent DB & Self-Scheduling. The two are orthogonal: an agent can use neither, either, or both.

Getting Started

Open the Management window (⌘ Shift M) → Memory
Memory is enabled by default — toggle it off in the Memory settings if you prefer stateless conversations
The core model for distillation defaults to foundation (Apple's on-device Language Model on macOS 26+) — change it in Settings → General if you'd rather use a remote model like anthropic/claude-haiku-4-5
Start chatting — sessions are distilled in the background once they end

No manual tagging, saving, or annotation is required.

Three Layers + Transcript

┌──────────────────────────────────────────────────────────────────┐
│                      Memory System (v2)                          │
├──────────────────────────────────────────────────────────────────┤
│  Identity                                                        │
│  Stable user facts: explicit overrides + auto-derived narrative. │
├──────────────────────────────────────────────────────────────────┤
│  Pinned Facts                                                    │
│  Salience-scored facts promoted from session distillations.      │
│  Decayed and evicted by the consolidator.                        │
├──────────────────────────────────────────────────────────────────┤
│  Episodes                                                        │
│  Per-session digests: summary, topics, decisions, entities.      │
│  Replaces the v1 separate "working memory" and "summaries"       │
│  tables.                                                         │
├──────────────────────────────────────────────────────────────────┤
│  Transcript                                                      │
│  Raw conversation turns kept for fallback retrieval only.        │
│  Never default-injected into context.                            │
└──────────────────────────────────────────────────────────────────┘

Identity

A single row containing two fields:

Overrides — explicit user-authored facts ("My name is Terence", "Always reply in English"). Always surfaced in context. Authored via the Memory → Your Overrides UI.
Content — auto-derived narrative ("User builds Swift apps for macOS, prefers Postgres, lives in PT timezone."). Regenerated by the consolidator from accumulated identity-grade signals.

Pinned Facts

The promotable pool. Each fact carries:

content — the fact itself, in plain text
salience — score in [0, 1]. Decayed weekly (exp(-Δdays / 30)). Evicted below the floor (default 0.2) once idle for 30+ days.
sourceCount — number of episodes that mention it
useCount / lastUsed — bumped every time the planner surfaces the fact in context

Episodes

One per session. Created by MemoryService.distillSession once the writer's debounce expires (default 60s of inactivity) or when the user navigates away. Each episode contains:

summary — one to three sentences
topics, entities — comma-separated lists
decisions, actionItems — newline-separated bullet items
salience — model-assigned score, decayed by the consolidator

Transcript

Raw user/assistant turns. Never injected into the default context block. Used only when the user asks for literal recall ("what did I exactly say...") or as fallback search via the transcript scope of search_memory.

Memory vs. SOUL.md (sandbox)

Memory and SOUL.md are separate surfaces by design — do not cross-pollinate them.

	Memory	`SOUL.md`
Author	Distilled from conversations by Osaurus	The agent itself, via `sandbox_write_file` (whole-file write or in-place edit)
Scope	Session facts, episodes, user identity	Stable preferences and patterns the agent learned about working with you
Update cadence	Background after each session	Whenever the agent observes a durable pattern
Where it lands	Prepended to the latest user message (volatile)	Static section in the system prompt (KV-cacheable)
Available in	Every chat	Sandbox mode only

If a fact belongs to a session ("we decided to use Postgres for the demo"), memory owns it. If a fact is a preference the agent should keep applying ("user prefers Postgres for new projects"), the agent's SOUL.md owns it. See SANDBOX.md for the full SOUL.md contract.

Write Path: Deferred and Debounced

[user + assistant turn]
         │
         ▼
[buffered as pending_signal]   ◄── per-turn cost: one SQL insert. No LLM.
         │
         ▼
   debounce 60s
         │  (or session-change / nav-away → flush immediately)
         ▼
[novelty gate: combined chars >= 80?]
         │
         ▼
[ONE LLM call: distill the whole session]
         │
         ▼
{episode + entities + pinned candidates + identity delta}
         │
         ├──► insert Episode (atomically marks signals processed)
         ├──► insert PinnedFact for each candidate that passes Jaccard dedup
         └──► append identity overrides for any new identity-grade facts

The hot path (bufferTurn) is a single SQL insert and a debounce arm. No LLM call ever runs synchronously with chat. Distillation is one LLM call per session (often covering 10+ turns), not one per turn.

When the core model isn't configured, signals stay pending — recoverOrphanedSignals (called at startup) and syncNow (called from the Memory UI) drain them once a model is available.

Read Path: Gated and Single-Section

[incoming user message]
         │
         ▼
[Relevance Gate: heuristic + optional LLM fallback]
   ├── pronouns / "we discussed" / "remember when"  → episode
   ├── "what's my name" / "who am I"                 → identity
   ├── entity-name hit in graph                      → pinned
   ├── "do you remember my preference"               → pinned
   ├── "exact words" / "verbatim"                    → transcript
   └── nothing fired                                 → none (skip memory)
         │
         ▼
[Memory Planner: fetches the chosen section under the budget]
         │
         ▼
[Compact memory block ≤ memoryBudgetTokens (default 800)]
   + always-on Identity Overrides (tiny)
         │
         ▼
[Prepend to the latest user message]

The cache layer holds the assembled block for 10s per (agent, query) pair so retries don't re-run the gate.

Consolidation: Background Maintenance

MemoryConsolidator runs on a low-priority background task every consolidationIntervalHours (default 24h) and on the explicit Run Consolidation Now button in the Memory UI. Each pass:

Decay — salience *= exp(-Δdays / 30) for both pinned facts and episodes.
Merge — collapse near-duplicate episodes (Jaccard ≥ 0.9 over summary+topics) within the same agent. Keeps the older episode, deletes the newer near-dup.
Promote — boost salience on pinned facts whose content overlaps with ≥ 3 recent episodes.
Evict — delete pinned facts below salienceFloor that have been idle for 30+ days.
Prune — drop episodes and transcript turns older than episodeRetentionDays.
Purge — trim old processing_log rows.

Consolidation never runs on the request path, so chat latency is unaffected.

Configuration Reference

The full configuration lives in ~/.osaurus/config/memory.json and is editable from the Memory tab in the Management window.

Setting	Default	Range	Description
`enabled`	`true`	true/false	Master toggle
`embeddingBackend`	`mlx`	`mlx` / `none`	Embedding backend. `none` falls back to SQLite text matching.
`embeddingModel`	`nomic-embed-text-v1.5`	—	Model used by VecturaKit
`extractionMode`	`sessionEnd`	`sessionEnd` / `manual`	When the writer runs distillation
`relevanceGateMode`	`heuristic`	`off` / `heuristic` / `llm`	How the read path decides whether to inject memory
`memoryBudgetTokens`	`800`	100 – 4,000	Single overall budget for the dynamic section
`summaryDebounceSeconds`	`60`	10 – 3,600	Inactivity period before distillation
`consolidationIntervalHours`	`24`	1 – 168	How often the consolidator runs
`salienceFloor`	`0.2`	0.0 – 1.0	Pinned facts below this and idle 30+ days are evicted
`episodeRetentionDays`	`365`	0 – 3,650	How long episodes/transcript are kept (0 = forever)

That's the entire surface. v1's 18 knobs (mmrLambda, mmrFetchMultiplier, verification*Threshold, per-section budgets, recall topK, profile regen thresholds, max entries per agent, …) are all gone.

Storage

All memory data is stored in a local SQLite database with WAL mode. Since 0.17.7 the database is encrypted at rest with SQLCipher using a key kept in your macOS Keychain — the same key chat history, methods, and tool indexes use.

Location: ~/.osaurus/memory/memory.sqlite (SQLCipher)

Configuration: ~/.osaurus/config/memory.json (plaintext)

Vector index: ~/.osaurus/memory/vectura/<agentId>/ — partitioned per agent so one agent's vectors never collide with another's. The vector files themselves are not yet encrypted (see STORAGE.md → Limitations); they are rebuilt from the encrypted SQLite source on first read after migration.

The schema is versioned. The v1 → v2 migration (migrateToV5) carries forward your identity, episodes (renamed from conversation_summaries), and transcript (renamed from conversation_chunks). The noisy v1 working-memory entries, profile events, verification audit log, agent activity, embeddings cache, and graph tables are all dropped — pinned_facts rebuilds organically from new conversations.

The v6 migration adds three FTS5 mirror tables (fts_pinned, fts_episodes, fts_transcript) backed by triggers on the source tables. Tokenizer is unicode61 remove_diacritics 2, which folds accents and case so non-English queries don't miss obvious matches. Existing rows are backfilled in a single INSERT … SELECT pass, so the migration is one short transaction even for large memory databases.

Search and Retrieval

When VecturaKit is available, search uses hybrid BM25 + vector matching with MMR reranking. When it's not (e.g. the embedding model isn't downloaded yet), search falls back to FTS5 MATCH queries against the per-table mirror tables introduced in the v6 migration — same tokenization (Unicode-folded) and the same prefix/phrase syntax SQLite documents for FTS5. SQL LIKE is only used as a final fallback when an FTS query can't be sanitized into a valid match expression (e.g. all-punctuation input).

The MMR reranker uses 4-character word shingles for cheap content overlap — much faster than the v1 Jaccard-over-tokenized-strings approach.

API Integration

Runtime Context Injection

Memory context is injected only by Osaurus-composed agent surfaces: app chat windows, POST /agents/{id}/run, and plugin host inference. Those paths run the relevance gate against the user's message, pick at most one memory section, and prepend it to the latest user message.

Strict POST /chat/completions requests do not inject Osaurus memory, agent prompts, skills, or tools. X-Osaurus-Agent-Id may associate persisted HTTP history with an agent/session, but it is not a memory injection switch.

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:1337/v1",
    api_key="osaurus",
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "What did we talk about last week?"}],
)

Memory Ingestion — `POST /memory/ingest`

Bulk-ingest conversation turns. Useful for seeding memory from existing chat logs, migrating from another system, or running benchmarks. Ingestion always flushes distillation immediately at the end of the batch — you don't have to wait for the debounce.

curl http://127.0.0.1:1337/memory/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "my-agent",
    "conversation_id": "session-1",
    "turns": [
      {"user": "Hi, my name is Alice", "assistant": "Hello Alice!"},
      {"user": "I work at Acme Corp", "assistant": "Got it."}
    ]
  }'

Parameter	Type	Description
`agent_id`	string	Identifier for the agent whose memory is being populated
`conversation_id`	string	Identifier for the conversation session
`turns`	array	Array of turn objects, each with `user` and `assistant` fields
`session_date`	string	Optional ISO 8601 date for the whole batch
`skip_extraction`	bool	When `true`, only insert transcript rows; skip distillation

List Agents — `GET /agents`

Returns all configured agents with their pinned-fact counts. Use this to discover valid agent IDs.

See the API Guide for additional examples.

Tool: `search_memory`

Agents can search their own memory via the built-in search_memory(scope, query) tool. Three scopes:

Scope	What it searches
`pinned`	High-salience facts
`episodes`	Per-session digests
`transcript`	Raw conversation excerpts

The v1 scopes working, summaries, all, and graph are gone — working was subsumed by pinned, summaries was renamed to episodes, and all / graph are no longer exposed. The relevance gate already picks the right slice for context injection; the tool exists for explicit recall the agent decides it needs.

Opt-in per agent. The search_memory tool is gated by Agent.settings.searchMemoryEnabled (default off), surfaced as Memory Recall under Configure → Features. It is decoupled from the Memory toggle (disableMemory): Memory controls passive context injection and transcript recording, while Memory Recall controls only mid-session active lookups. An agent can read injected memory without exposing the recall tool, or vice versa.

Managing Memory

Memory View

Open the Management window (⌘ Shift M) → Memory to see:

Your identity (auto-derived content + manual overrides)
Pinned facts with salience bars and use counts
Episodes for the default agent
Per-agent counts
Processing statistics (total calls, success rate, average duration)
Database size
Run Consolidation Now button

Adding User Overrides

Identity overrides are explicit facts that always appear in context.

Go to Memory → Your Overrides
Click Add
Enter a fact (e.g. "I prefer tabs over spaces" or "My company uses a monorepo")

Clearing Memory

The Memory view includes a danger zone for clearing all memory data. This removes identity, pinned facts, episodes, and transcript. The action is irreversible.

Sync / Run Consolidation

Sync Now drains pending signals immediately, distilling any sessions that haven't yet flushed.
Run Consolidation Now kicks off a one-shot pass of the consolidator (decay, merge, promote, evict, prune).

Migration from v1

The v5 schema migration is automatic on first launch after an upgrade. It runs as pure SQL — no LLM calls — and:

Carries forward user_profile.content → identity.content
Carries forward user_edits → identity.overrides
Carries forward conversation_summaries → episodes (default salience = 0.5)
Carries forward conversation_chunks → transcript
Drops memory_entries (the noisy 7-type extractor output)
Drops profile_events, memory_events, agent_activity, embeddings
Drops the graph tables (entities, relationships)

pinned_facts starts empty and accrues organically as new sessions are distilled. The Vectura vector index is wiped and rebuilt lazily on first read.

Alongside the schema migration, the storage encryption migration runs once on first launch of 0.17.7+ and re-keys memory.sqlite (and every other Osaurus database) into SQLCipher. It's automatic and shows a brief overlay; details, key-rotation, and plaintext-export instructions live in STORAGE.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory

Getting Started

Three Layers + Transcript

Identity

Pinned Facts

Episodes

Transcript

Memory vs. SOUL.md (sandbox)

Write Path: Deferred and Debounced

Read Path: Gated and Single-Section

Consolidation: Background Maintenance

Configuration Reference

Storage

Search and Retrieval

API Integration

Runtime Context Injection

Memory Ingestion — `POST /memory/ingest`

List Agents — `GET /agents`

Tool: `search_memory`

Managing Memory

Memory View

Adding User Overrides

Clearing Memory

Sync / Run Consolidation

Migration from v1

FilesExpand file tree

MEMORY.md

Latest commit

History

MEMORY.md

File metadata and controls

Memory

Getting Started

Three Layers + Transcript

Identity

Pinned Facts

Episodes

Transcript

Memory vs. SOUL.md (sandbox)

Write Path: Deferred and Debounced

Read Path: Gated and Single-Section

Consolidation: Background Maintenance

Configuration Reference

Storage

Search and Retrieval

API Integration

Runtime Context Injection

Memory Ingestion — POST /memory/ingest

List Agents — GET /agents

Tool: search_memory

Managing Memory

Memory View

Adding User Overrides

Clearing Memory

Sync / Run Consolidation

Migration from v1

Memory Ingestion — `POST /memory/ingest`

List Agents — `GET /agents`

Tool: `search_memory`