Common questions about MARM MCP, memory behavior, transports, supported clients, and local deployment.
MARM Systems is a persistent memory layer for AI agents. The MCP server gives Claude, Codex, Gemini, Qwen, VS Code, Cursor, and other MCP-compatible clients a shared way to store, recall, organize, and reuse project context across sessions.
| Component | Description | Best For |
|---|---|---|
| MARM MCP Server | Persistent memory server with 9 focused MCP tools | AI agents, IDEs, local workflows, shared team memory |
| MARM Protocol | Runtime guidance delivered automatically by the MCP server | Keeping agents aligned on what to store, recall, and trust |
| MARM Dashboard | Local browser UI for viewing memory and server health | Inspection, cleanup, and quick status checks |
| Feature | Built-in AI Memory | MARM Systems |
|---|---|---|
| Control | Limited and platform-defined | User-owned SQLite database |
| Portability | Usually platform-locked | Works across MCP-compatible clients |
| Recall | Often opaque | Explicit hybrid recall and structured logs |
| Sharing | Hard to move between tools | Multiple agents can use the same memory store |
| Trust model | Memory behavior varies by provider | Retrieved memory is context, not higher-priority instruction |
MARM uses hybrid recall rather than simple keyword matching alone. Semantic embeddings find related memories when the wording differs, and FTS keyword/BM25 search improves exact recall for commands, config keys, filenames, and error text.
MARM is strongest for developers, researchers, power users, and teams doing long-running work where context continuity matters. It is less useful for quick one-off questions where a normal chat is enough.
MARM does not enforce a small fixed memory limit. It stores data in a local SQLite database under ~/.marm/, with semantic embeddings and an FTS index for recall. Practical limits depend on disk space, database size, and how much old context you keep searchable.
Use the README quick start for the shortest path, then use the install docs when you need deeper setup details:
README.md- quick start and client connection examplesdocs/INSTALL-DOCKER.md- Docker HTTP and Docker STDIOdocs/INSTALL-WINDOWS.md- Windows local installdocs/INSTALL-LINUX.md- Linux local installdocs/INSTALL-PLATFORMS.md- Claude, Codex, Gemini, Qwen, VS Code, Cursor, and Grok notes
MARM has been tested with Claude Code, Codex, Gemini CLI, Qwen CLI, VS Code MCP, and Cursor MCP. Any client that supports standard MCP HTTP or STDIO transports should be able to connect with the right command or config.
| Transport | Best For | Key Requirement |
|---|---|---|
| HTTP | Shared memory server, multiple agents, IDE/client reuse | Use an API key when exposed through Docker or 0.0.0.0 |
| STDIO | Private local agent connection | No network port or API key required |
HTTP is the better fit when several agents or tools should share one memory database. STDIO is the simpler local option when one client launches MARM directly.
Docker HTTP mode should use MARM_API_KEY because the server is listening through a container network bridge. Docker STDIO mode does not need a key because it communicates over local process stdin/stdout, not a network port.
For HTTP mode, use the MARM Dashboard status panel or run curl http://localhost:8001/health. For STDIO mode, confirm your MCP client lists the MARM tools and can call a simple recall or log command.
MARM currently exposes 9 focused MCP tools:
| Category | Tools | Description |
|---|---|---|
| Memory Intelligence | marm_smart_recall, marm_context_log |
Hybrid recall and intelligent memory storage |
| Logging | marm_log_session, marm_log_entry, marm_log_show |
Session-based conversation/project logs |
| Notebook | marm_notebook |
Reusable instructions and knowledge with action="add", "use", "show", "status", or "clear" |
| Delete | marm_delete |
Delete log sessions, log entries, or notebook entries |
| Summary | marm_summary |
Generate concise context summaries |
| Maintenance | marm_compaction |
Agent-assisted memory compaction with action="status", "candidates", "review", "stage", "apply", or "discard" |
No. Session startup, protocol delivery, and documentation loading are now automatic. The server injects the protocol on the first successful MCP tool call for each session scope, then keeps docs indexed with hash-based caching so unchanged docs are not repeatedly duplicated.
Use HTTP mode so one MARM server coordinates shared database access. The write queue is enabled by default. Start shared servers with --swarm for 200 RPM, --swarm-max for 600 RPM, or --trusted to disable rate limiting on a private trusted deployment.
Run one MARM HTTP process per SQLite database. Multi-process Uvicorn/Gunicorn workers are not supported yet because the write queue, scheduler, protocol delivery, and some active session state are process-local. Swarm presets increase safe concurrency inside one process; true multi-worker HTTP scaling is future work.
Yes. Use HTTP mode for shared access. Multiple agents can read and write to the same SQLite database through one MARM server process. Avoid running many separate STDIO containers against the same SQLite file at the same time; SQLite locking can apply under concurrent writes.
No. In HTTP mode, MARM runs as a server and multiple clients can connect to it. In STDIO mode, each client usually launches its own private MARM process.
Your AI client can still run, but MARM memory tools will be unavailable until the server reconnects or the STDIO process restarts.
MARM uses hybrid recall. Embeddings find memories by meaning, FTS keyword/BM25 search handles exact terms, and a conservative temporal weighting step gives newer memories a modest boost when scores are otherwise close. A search for "authentication error" can surface memories about login failures, access denial, token setup, or user verification even when those exact words are not repeated, while a search for something like COMPACTION_TRIGGER_COUNT or a Docker command can hit the exact stored text reliably.
When you store memory through marm_context_log, MARM classifies content into broad context types such as code, project, book/research, or general. This helps later recall and summaries stay organized without requiring users to tag every write manually.
Both. marm_smart_recall searches one session by default and can search across all sessions with search_all=True.
When the semantic embedding lane reaches its configured scan cap, responses include recall_scan_truncated=true and recall_scan_limit so agents know that part of recall was bounded. Exact-term FTS recall still runs alongside it.
If you need less context back from each hit, marm_smart_recall also supports detail=1/2/3 so agents can default to short previews and only request full memory bodies when needed.
Create a new session for a distinct project, topic, or workstream. Continue an existing session when the new work depends on the same decisions, constraints, or context.
Be selective. Log decisions, solutions, insights, requirements, constraints, and important discoveries. Avoid filling memory with low-value transcript noise.
Use consistent session names, include project or workstream names, and rely on cross-session search for broad recall. For shared agent workflows, prefer HTTP mode so one server coordinates writes.
MARM has optional memory-maintenance layers. CONSOLIDATION_ENABLED=1 enables write-time exact duplicate and semantic near-duplicate handling. COMPACTION_ENABLED=1 enables background candidate detection; when candidates are ready, MARM asks the connected agent to use marm_compaction to stage, review, apply, or discard summaries. Source memory IDs stay attached for traceability.
For normal use, wait for MARM to surface compaction candidates. For heavy shared-memory workflows, review staged summaries periodically so old duplicate clusters do not add recall noise.
Yes. Back up the ~/.marm/ directory to preserve your database and related local MARM state.
No. Retrieved memories, notebook entries, logs, and tool outputs are treated as context only. They must not override higher-priority instructions, request secrets, bypass tool policies, or change the agent's safety rules.