Osaurus has a persistent, on-device memory system that learns from your conversations and surfaces relevant context only when it actually helps. Memory runs in the background, stores very little, and injects ~800 tokens (or zero) per turn instead of the firehose-style stuffing that v1 did.
The mental model is simple: a smart secretary that knows what you've discussed and surfaces only what matters right now — not a tape recorder.
Not the same as per-agent DB. Memory is global across all your chats — it's identity, pinned facts, and episodic recall derived from natural conversation. For an agent that maintains its own private structured store (custom tables, soft-deletes, saved views), see Agent DB & Self-Scheduling. The two are orthogonal: an agent can use neither, either, or both.
- Open the Management window (
⌘ Shift M) → Memory - Memory is enabled by default — toggle it off in the Memory settings if you prefer stateless conversations
- The core model for distillation defaults to
foundation(Apple's on-device Language Model on macOS 26+) — change it in Settings → General if you'd rather use a remote model likeanthropic/claude-haiku-4-5 - Start chatting — sessions are distilled in the background once they end
No manual tagging, saving, or annotation is required.
┌──────────────────────────────────────────────────────────────────┐
│ Memory System (v2) │
├──────────────────────────────────────────────────────────────────┤
│ Identity │
│ Stable user facts: explicit overrides + auto-derived narrative. │
├──────────────────────────────────────────────────────────────────┤
│ Pinned Facts │
│ Salience-scored facts promoted from session distillations. │
│ Decayed and evicted by the consolidator. │
├──────────────────────────────────────────────────────────────────┤
│ Episodes │
│ Per-session digests: summary, topics, decisions, entities. │
│ Replaces the v1 separate "working memory" and "summaries" │
│ tables. │
├──────────────────────────────────────────────────────────────────┤
│ Transcript │
│ Raw conversation turns kept for fallback retrieval only. │
│ Never default-injected into context. │
└──────────────────────────────────────────────────────────────────┘
A single row containing two fields:
- Overrides — explicit user-authored facts ("My name is Terence", "Always reply in English"). Always surfaced in context. Authored via the Memory → Your Overrides UI.
- Content — auto-derived narrative ("User builds Swift apps for macOS, prefers Postgres, lives in PT timezone."). Regenerated by the consolidator from accumulated identity-grade signals.
The promotable pool. Each fact carries:
content— the fact itself, in plain textsalience— score in[0, 1]. Decayed weekly (exp(-Δdays / 30)). Evicted below the floor (default0.2) once idle for 30+ days.sourceCount— number of episodes that mention ituseCount/lastUsed— bumped every time the planner surfaces the fact in context
One per session. Created by MemoryService.distillSession once the writer's debounce expires (default 60s of inactivity) or when the user navigates away. Each episode contains:
summary— one to three sentencestopics,entities— comma-separated listsdecisions,actionItems— newline-separated bullet itemssalience— model-assigned score, decayed by the consolidator
Raw user/assistant turns. Never injected into the default context block. Used only when the user asks for literal recall ("what did I exactly say...") or as fallback search via the transcript scope of search_memory.
Memory and SOUL.md are separate surfaces by design — do not cross-pollinate them.
| Memory | SOUL.md |
|
|---|---|---|
| Author | Distilled from conversations by Osaurus | The agent itself, via sandbox_write_file (whole-file write or in-place edit) |
| Scope | Session facts, episodes, user identity | Stable preferences and patterns the agent learned about working with you |
| Update cadence | Background after each session | Whenever the agent observes a durable pattern |
| Where it lands | Prepended to the latest user message (volatile) | Static section in the system prompt (KV-cacheable) |
| Available in | Every chat | Sandbox mode only |
If a fact belongs to a session ("we decided to use Postgres for the demo"), memory owns it. If a fact is a preference the agent should keep applying ("user prefers Postgres for new projects"), the agent's SOUL.md owns it. See SANDBOX.md for the full SOUL.md contract.
[user + assistant turn]
│
▼
[buffered as pending_signal] ◄── per-turn cost: one SQL insert. No LLM.
│
▼
debounce 60s
│ (or session-change / nav-away → flush immediately)
▼
[novelty gate: combined chars >= 80?]
│
▼
[ONE LLM call: distill the whole session]
│
▼
{episode + entities + pinned candidates + identity delta}
│
├──► insert Episode (atomically marks signals processed)
├──► insert PinnedFact for each candidate that passes Jaccard dedup
└──► append identity overrides for any new identity-grade facts
The hot path (bufferTurn) is a single SQL insert and a debounce arm. No LLM call ever runs synchronously with chat. Distillation is one LLM call per session (often covering 10+ turns), not one per turn.
When the core model isn't configured, signals stay pending — recoverOrphanedSignals (called at startup) and syncNow (called from the Memory UI) drain them once a model is available.
[incoming user message]
│
▼
[Relevance Gate: heuristic + optional LLM fallback]
├── pronouns / "we discussed" / "remember when" → episode
├── "what's my name" / "who am I" → identity
├── entity-name hit in graph → pinned
├── "do you remember my preference" → pinned
├── "exact words" / "verbatim" → transcript
└── nothing fired → none (skip memory)
│
▼
[Memory Planner: fetches the chosen section under the budget]
│
▼
[Compact memory block ≤ memoryBudgetTokens (default 800)]
+ always-on Identity Overrides (tiny)
│
▼
[Prepend to the latest user message]
The cache layer holds the assembled block for 10s per (agent, query) pair so retries don't re-run the gate.
MemoryConsolidator runs on a low-priority background task every consolidationIntervalHours (default 24h) and on the explicit Run Consolidation Now button in the Memory UI. Each pass:
- Decay —
salience *= exp(-Δdays / 30)for both pinned facts and episodes. - Merge — collapse near-duplicate episodes (Jaccard ≥ 0.9 over summary+topics) within the same agent. Keeps the older episode, deletes the newer near-dup.
- Promote — boost salience on pinned facts whose content overlaps with ≥ 3 recent episodes.
- Evict — delete pinned facts below
salienceFloorthat have been idle for 30+ days. - Prune — drop episodes and transcript turns older than
episodeRetentionDays. - Purge — trim old
processing_logrows.
Consolidation never runs on the request path, so chat latency is unaffected.
The full configuration lives in ~/.osaurus/config/memory.json and is editable from the Memory tab in the Management window.
| Setting | Default | Range | Description |
|---|---|---|---|
enabled |
true |
true/false | Master toggle |
embeddingBackend |
mlx |
mlx / none |
Embedding backend. none falls back to SQLite text matching. |
embeddingModel |
nomic-embed-text-v1.5 |
— | Model used by VecturaKit |
extractionMode |
sessionEnd |
sessionEnd / manual |
When the writer runs distillation |
relevanceGateMode |
heuristic |
off / heuristic / llm |
How the read path decides whether to inject memory |
memoryBudgetTokens |
800 |
100 – 4,000 | Single overall budget for the dynamic section |
summaryDebounceSeconds |
60 |
10 – 3,600 | Inactivity period before distillation |
consolidationIntervalHours |
24 |
1 – 168 | How often the consolidator runs |
salienceFloor |
0.2 |
0.0 – 1.0 | Pinned facts below this and idle 30+ days are evicted |
episodeRetentionDays |
365 |
0 – 3,650 | How long episodes/transcript are kept (0 = forever) |
That's the entire surface. v1's 18 knobs (mmrLambda, mmrFetchMultiplier, verification*Threshold, per-section budgets, recall topK, profile regen thresholds, max entries per agent, …) are all gone.
All memory data is stored in a local SQLite database with WAL mode. Since 0.17.7 the database is encrypted at rest with SQLCipher using a key kept in your macOS Keychain — the same key chat history, methods, and tool indexes use.
Location: ~/.osaurus/memory/memory.sqlite (SQLCipher)
Configuration: ~/.osaurus/config/memory.json (plaintext)
Vector index: ~/.osaurus/memory/vectura/<agentId>/ — partitioned per agent so one agent's vectors never collide with another's. The vector files themselves are not yet encrypted (see STORAGE.md → Limitations); they are rebuilt from the encrypted SQLite source on first read after migration.
The schema is versioned. The v1 → v2 migration (migrateToV5) carries forward your identity, episodes (renamed from conversation_summaries), and transcript (renamed from conversation_chunks). The noisy v1 working-memory entries, profile events, verification audit log, agent activity, embeddings cache, and graph tables are all dropped — pinned_facts rebuilds organically from new conversations.
The v6 migration adds three FTS5 mirror tables (fts_pinned, fts_episodes, fts_transcript) backed by triggers on the source tables. Tokenizer is unicode61 remove_diacritics 2, which folds accents and case so non-English queries don't miss obvious matches. Existing rows are backfilled in a single INSERT … SELECT pass, so the migration is one short transaction even for large memory databases.
When VecturaKit is available, search uses hybrid BM25 + vector matching with MMR reranking. When it's not (e.g. the embedding model isn't downloaded yet), search falls back to FTS5 MATCH queries against the per-table mirror tables introduced in the v6 migration — same tokenization (Unicode-folded) and the same prefix/phrase syntax SQLite documents for FTS5. SQL LIKE is only used as a final fallback when an FTS query can't be sanitized into a valid match expression (e.g. all-punctuation input).
The MMR reranker uses 4-character word shingles for cheap content overlap — much faster than the v1 Jaccard-over-tokenized-strings approach.
Memory context is injected only by Osaurus-composed agent surfaces: app chat windows, POST /agents/{id}/run, and plugin host inference. Those paths run the relevance gate against the user's message, pick at most one memory section, and prepend it to the latest user message.
Strict POST /chat/completions requests do not inject Osaurus memory, agent prompts, skills, or tools. X-Osaurus-Agent-Id may associate persisted HTTP history with an agent/session, but it is not a memory injection switch.
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:1337/v1",
api_key="osaurus",
)
response = client.chat.completions.create(
model="your-model-name",
messages=[{"role": "user", "content": "What did we talk about last week?"}],
)Bulk-ingest conversation turns. Useful for seeding memory from existing chat logs, migrating from another system, or running benchmarks. Ingestion always flushes distillation immediately at the end of the batch — you don't have to wait for the debounce.
curl http://127.0.0.1:1337/memory/ingest \
-H "Content-Type: application/json" \
-d '{
"agent_id": "my-agent",
"conversation_id": "session-1",
"turns": [
{"user": "Hi, my name is Alice", "assistant": "Hello Alice!"},
{"user": "I work at Acme Corp", "assistant": "Got it."}
]
}'| Parameter | Type | Description |
|---|---|---|
agent_id |
string | Identifier for the agent whose memory is being populated |
conversation_id |
string | Identifier for the conversation session |
turns |
array | Array of turn objects, each with user and assistant fields |
session_date |
string | Optional ISO 8601 date for the whole batch |
skip_extraction |
bool | When true, only insert transcript rows; skip distillation |
Returns all configured agents with their pinned-fact counts. Use this to discover valid agent IDs.
See the API Guide for additional examples.
Agents can search their own memory via the built-in search_memory(scope, query) tool. Three scopes:
| Scope | What it searches |
|---|---|
pinned |
High-salience facts |
episodes |
Per-session digests |
transcript |
Raw conversation excerpts |
The v1 scopes working, summaries, all, and graph are gone — working was subsumed by pinned, summaries was renamed to episodes, and all / graph are no longer exposed. The relevance gate already picks the right slice for context injection; the tool exists for explicit recall the agent decides it needs.
Opt-in per agent. The search_memory tool is gated by Agent.settings.searchMemoryEnabled (default off), surfaced as Memory Recall under Configure → Features. It is decoupled from the Memory toggle (disableMemory): Memory controls passive context injection and transcript recording, while Memory Recall controls only mid-session active lookups. An agent can read injected memory without exposing the recall tool, or vice versa.
Open the Management window (⌘ Shift M) → Memory to see:
- Your identity (auto-derived content + manual overrides)
- Pinned facts with salience bars and use counts
- Episodes for the default agent
- Per-agent counts
- Processing statistics (total calls, success rate, average duration)
- Database size
- Run Consolidation Now button
Identity overrides are explicit facts that always appear in context.
- Go to Memory → Your Overrides
- Click Add
- Enter a fact (e.g. "I prefer tabs over spaces" or "My company uses a monorepo")
The Memory view includes a danger zone for clearing all memory data. This removes identity, pinned facts, episodes, and transcript. The action is irreversible.
- Sync Now drains pending signals immediately, distilling any sessions that haven't yet flushed.
- Run Consolidation Now kicks off a one-shot pass of the consolidator (decay, merge, promote, evict, prune).
The v5 schema migration is automatic on first launch after an upgrade. It runs as pure SQL — no LLM calls — and:
- Carries forward
user_profile.content→identity.content - Carries forward
user_edits→identity.overrides - Carries forward
conversation_summaries→episodes(defaultsalience = 0.5) - Carries forward
conversation_chunks→transcript - Drops
memory_entries(the noisy 7-type extractor output) - Drops
profile_events,memory_events,agent_activity,embeddings - Drops the graph tables (
entities,relationships)
pinned_facts starts empty and accrues organically as new sessions are distilled. The Vectura vector index is wiped and rebuilt lazily on first read.
Alongside the schema migration, the storage encryption migration runs once on first launch of 0.17.7+ and re-keys memory.sqlite (and every other Osaurus database) into SQLCipher. It's automatic and shows a brief overlay; details, key-rotation, and plaintext-export instructions live in STORAGE.md.