Skip to content

docs: RFC 001 — storage backend plugin specification#743

Open
igorls wants to merge 5 commits intodevelopfrom
docs/rfc-storage-backend-plugin-spec
Open

docs: RFC 001 — storage backend plugin specification#743
igorls wants to merge 5 commits intodevelopfrom
docs/rfc-storage-backend-plugin-spec

Conversation

@igorls
Copy link
Copy Markdown
Collaborator

@igorls igorls commented Apr 12, 2026

Summary

  • Drafts a formal contract for MemPalace storage backends so third parties can ship pip install mempalace-<name> packages that drop into core without patches.
  • Resolves the open design decisions deliberately deferred by the Мempalace backend seam #413 seam, driven by the six in-flight backend PRs each implementing the interface differently.
  • Sets the stage for MemPalace as a long-lived daemon managing many palaces, where different palaces may route to different backends.

Tracking issue: #737 (see discussion and follow-up comments for the design rationale).

Key decisions in the draft

  • Distribution: entry-point group mempalace.backends; pip install is sufficient to add a backend.
  • Return types: typed QueryResult / GetResult dataclasses replace Chroma's dict shape from day one. ChromaCollection gets a thin adapter.
  • Daemon-first model: PalaceRef(id, local_path?, namespace?) replaces palace_path: str. Backend instances are long-lived and multi-palace; thread-safe across palaces.
  • Where-clause contract: required subset $eq, $ne, $in, $nin, $and, $or, $contains; unknown operators MUST raise UnsupportedFilterError (silent drop forbidden).
  • Embedder decoupling: mandatory injection; backends persist model_name and raise EmbedderIdentityMismatchError on swap (not just dimension).
  • Capability tokens: supports_* naming, free-form strings, extensible by third parties.
  • Conformance: shared AbstractBackendContractSuite pytest mixin + optional parametrized run of the core suite across all backends under MEMPALACE_TEST_ALL_BACKENDS=1.
  • Benchmark honesty: optional maintenance_state() / run_maintenance(kind); published numbers must cover three phases (post-load, post-native-maintenance, post-explicit-maintenance).
  • Migration: backend-neutral mempalace migrate + mempalace verify operating through BaseCollection only — no per-backend migration code.
  • Sync: capability flag + separate subsystem; local deployments never load it.

Impact on in-flight PRs

Each of #574, #643, #665, #697, #700, #381 is called out in §11 with the specific alignment work required — #574 is closest to the final shape, #697's collection_prefix concern dissolves into PalaceRef.namespace, #700 and #381 need the canonical UUID5 namespace.

Gating cleanup (not in this PR)

Seven files still import chromadb directly (repair.py, dedup.py, cli.py ×2, mcp_server.py, migrate.py, plus an instructions doc). Combined with the dict-to-typed-result migration, this needs its own PR landing before the spec can be enforced. Called out in §10.

Test plan

  • Backend authors (@skuznetsov @dekoza @RobertoGEMartin @cschnatz @Anush008) review for show-stoppers against their PRs
  • Agreement on the three open questions in §12 (per-collection changes_since filter, per-palace capability query, run_maintenance return shape)
  • Canonical NAMESPACE_MEMPALACE UUID assigned (§7.4)
  • Once approved, follow-up PR to land the seam cleanup + typed-result migration (§10)
  • Follow-up PR to land AbstractBackendContractSuite + entry-point discovery + PalaceRouter + PalaceRef

Formalizes the BaseCollection/BaseBackend contract introduced as a seam
in #413 into an interchangeability spec that third-party backends can
build to. Driven by six in-flight backend PRs (#574, #643, #665, #697,
#700, #381) each implementing the interface differently.

Key decisions captured: entry-point distribution, typed QueryResult/
GetResult replacing Chroma dict shape, daemon-first multi-palace model
via PalaceRef, required where-clause subset (incl. $contains),
mandatory embedder injection with model-identity validation, capability
tokens, shared pytest conformance suite, and a backend-neutral
migrate/verify CLI.
Copilot AI review requested due to automatic review settings April 12, 2026 23:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces RFC 001, a draft specification for MemPalace storage backend plugins. It formalizes a stable contract so third-party backends can be distributed as mempalace-<name> Python packages and loaded into MemPalace via entry-point discovery, enabling multi-backend / multi-palace (daemon-first) deployments.

Changes:

  • Adds a full draft spec defining the BaseCollection / backend lifecycle contract, including typed QueryResult / GetResult shapes.
  • Specifies backend discovery/selection (entry points + registry), configuration shape, and capability-token conventions.
  • Defines required filter dialect behavior, migration/verification expectations, and a shared backend conformance test suite concept.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/rfcs/001-storage-backend-plugin-spec.md Outdated
Comment thread docs/rfcs/001-storage-backend-plugin-spec.md Outdated
igorls added 2 commits April 12, 2026 20:35
Copilot review flagged back-references in §1.4 and §6 that still used
the pre-skuznetsov-rename names (`$contains_fast`, `sync_capable`,
`change_feed`). Updated to the `supports_*` prefix used in the §2.1
capability table.
Incorporates review feedback from skuznetsov (Postgres, #665) and
dekoza (Lance, #574) on issue #737:

- §1.5: split 'accepts embeddings=' (signature compliance) from
  'persists embeddings as-is' (correctness). Adds
  supports_embeddings_passthrough capability; the former is universal,
  the latter is required to label a migration lossless.
- §1.5: model identity check becomes a three-state machine
  (known_match / known_mismatch / unknown) so legacy palaces without
  recorded identity don't hard-fail on upgrade.
- §1.4: makes explicit that supports_contains_fast is the ONLY
  performance floor the spec promises; without it callers MUST assume
  O(n). $contains is a correctness requirement, not a performance one.
- §3.3: clarifies auto-detect is an upgrade-compat path only, never
  the selection mechanism for new palaces.
- §8.2: migrate CLI refuses to run against a target lacking
  supports_embeddings_passthrough unless --accept-re-embed is passed;
  migration record now captures lossless status and model identities.
Copy link
Copy Markdown

@dekoza dekoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed from the LanceDB (#574) perspective. No show-stoppers — our implementation is already close to the spec's target shape. Notes below.

§1.1 — kwargs-only signatures

Our backends/lance.py already uses kwargs-only on add/upsert, and our BaseCollection ABC in backends/base.py does the same (work done in #575, easily backported). query/get/delete currently use **kwargs catch-all — those need explicit parameter lists to match the spec.

§1.1 — query_embeddings and where_document

Our query() doesn't accept query_embeddings and neither query() nor get() accepts where_document. Both are straightforward for LanceDB:

  • query_embeddings: skip the embed step, search the vector directly
  • where_document: same $contains path (Tantivy FTS or LIKE fallback)

§1.1 — update() method

Our BaseCollection and LanceCollection already have an update() method (fetches existing records, merges changes, re-upserts). The spec's BaseCollection does not include update(), though supports_update exists as a capability token. Is update() intentionally excluded from the required interface, or an oversight? If it stays out, we'd keep it as a LanceDB-specific extension — but it seems generally useful.

§1.3 — Typed results

Our return shapes already match QueryResult/GetResult field structure. Migration is wrapping dict construction in dataclass constructors. No concern.

§1.5 — Three-state model identity

We persist embedding_model per-record in metadata_json but not at collection level. Spec wants collection-level persistence + check on open. LanceDB supports table-level metadata via Arrow schema metadata — clean fit. The unknown state for legacy palaces is the right call; hard-failing on upgrade would break existing users.

§2.3 — BaseBackend class

We already have LanceBackend in backends/lance.py with a get_collection() factory method (from #575, easily backported to #574). It currently takes palace_path: str — adapting to PalaceRef is straightforward. It does not yet subclass a BaseBackend ABC (spec §2.3), but the method shape is close.

§3.3 — Auto-detection

Our backends/__init__.py already has detect_backend() sniffing for .lance directories. Adapting to a BaseBackend.detect(path) -> bool classmethod is trivial. Agree with the spec's framing that auto-detection is a migration compatibility path, not a selection mechanism.

§6 — Sync compatibility note

Our sync implementation uses a _raw=True flag on upsert() to skip sync metadata injection during sync apply. The spec's BaseCollection.upsert() has no such parameter, which is correct — sync concerns should not leak into the collection contract. This means we need to refactor sync injection out of LanceCollection into a wrapper layer, which aligns with §6's "sync is a separate subsystem" stance. Wanted to flag this since other backends implementing sync will face the same design question.

§10 — Cleanup gating

+1 on landing the Chroma direct-import cleanup before backend PRs rebase. The seven files importing chromadb directly would break any pure-Lance deployment. This cleanup benefits all backend authors equally.

§11 — Alignment effort for #574

Assessment is accurate. We already have backends/ with BaseCollection ABC, LanceBackend, and kwargs-only signatures (work from #575, easily backported). Remaining work: explicit param lists on query/get/delete (instead of **kwargs), typed results, $contains/where_document, collection-level model identity, PalaceRef adoption, conformance test subclass, sync injection refactor, and package extraction to mempalace-lance. All small-to-medium. No architectural conflicts.

§12 — Open questions

  1. changes_since with collection filter — yes, useful. Our sync tracks changes per-table; filtering by collection name is natural.
  2. Per-palace capabilities — worth having. supports_contains_fast depending on whether an FTS index exists is a real case for us.
  3. run_maintenance return shape — structured return preferred. LanceDB compaction can report fragment count before/after and bytes reclaimed, useful for operator dashboards and benchmark reporting.

@skuznetsov
Copy link
Copy Markdown
Contributor

Reviewed the actual RFC diff. From the Postgres / pg_sorted_heap side, I do not see a blocker. The direction is compatible with #665: injected embedder, explicit capability tokens, lossless-vs-reembed migration split, and maintenance-state benchmark hooks all map cleanly.

Three small spec-clarity points I would consider before backend authors start rebasing:

  1. Goal wording vs §8.2 migration semantics

The Goals section says mempalace migrate moves palaces between backends “without data loss”. §8.2 is more precise: migration is lossless only when the source supports embedding export, the target supports embedding passthrough, and model identity is compatible. I would mirror that nuance in the Goals section so the RFC does not overpromise. Suggested framing: “supports lossless migration when source/target capabilities allow it, and explicit re-embedding otherwise.”

  1. server_embedder needs an explicit identity/dimension contract

§1.5 says backends MUST NOT hardcode models and must validate model identity/dimension, but §2.1 / §5 allow server_embedder backends to ignore the injected embedder. That can be valid, but the RFC should say what identity gets recorded and validated in that mode. For example: a server-embedder backend should expose its effective model_name and dimension, and the same dimension/model identity rules should apply to that effective embedder. Otherwise server_embedder becomes an implicit escape hatch from the safety rules.

  1. Maintenance kinds need discovery

§7.3 says benchmarks should run explicit run_maintenance if the backend advertises maintenance kinds, but I do not see where those kinds are advertised. A small hook would make the benchmark harness deterministic, e.g. maintenance_kinds: frozenset[str] or maintenance_state()["supported_kinds"]. Without that, harnesses will need backend-specific knowledge to decide whether "analyze", "compact", or "reindex" exists.

None of these change the shape of the RFC. They are mostly guardrails to keep the spec precise once implementations begin targeting it.

Copy link
Copy Markdown
Contributor

@jphein jphein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed from the perspective of a 134K-drawer production deployment on the Chroma backend, with experience evaluating LanceDB (via Karta's embedded LanceDB + SQLite architecture).

No show-stoppers. This is the right time to formalize the backend contract — the six-way divergence is already causing friction. Notes below.

§1.3 — Typed results (migration cost for existing forks/consumers)

This is a breaking change for anyone with code touching mcp_server.py or searcher.py that assumes dict returns. My fork has ~20 changes across those files — all dict-shaped. The migration path is clear (wrap in dataclass constructors, same as dekoza noted), but the blast radius is worth calling out: it's not just backend authors who need to update, it's everyone downstream who consumes query results. A compatibility shim (.to_dict() on the result types) during the transition would make the upgrade gentler for plugin authors and forks.

§1.5 — Three-state model identity is exactly right

My palace has 134K drawers with no recorded embedder identity. Hard-failing on upgrade would brick every pre-v1 palace. The unknown → warn → record-on-next-write path is the correct design. One detail: the spec says the identity is recorded "on the next successful write, reindex, or migration." For read-heavy deployments that rarely write new drawers, this means the palace could stay in unknown state indefinitely. Consider also recording on explicit mempalace verify or mempalace status (read-only operations that already touch the collection) so operators can resolve it without forcing a write.

§2.5 — Concurrency guarantee matches real usage

The spec says backends can assume single-thread access per collection, with core serializing per palace. This matches how the MCP server actually works today — good call. My fork added threading.Lock to the graph cache (#661) because build_graph() was the one place concurrent access was possible. The per-palace serialization guarantee would have made that unnecessary. Worth documenting this guarantee prominently so backend authors don't over-synchronize.

§7.3 — Benchmark methodology is the most underrated section

At 134K drawers on Chroma, I've hit the HNSW pathology firsthand — the graph's on-disk size and query latency behave very differently before and after compaction, and after external writes that invalidate the in-memory index (the exact problem my stale HNSW mtime detection fix addresses in #663). The three-phase benchmark requirement (post-bulk-load, post-background-maintenance, post-explicit-maintenance) would have caught the performance cliff I found empirically. Strong +1 on maintenance_state() being part of the published numbers.

§8 — Migration at scale

At 134K drawers, re-embedding is not a minor cost — it's hours of compute depending on the model. The lossless vs re-embed distinction and the --accept-re-embed explicit opt-in are the right design. One request: mempalace migrate should report progress (rows migrated / total, ETA) for large palaces. A 134K-drawer migration that runs silently for hours will get killed by impatient operators.

§10 — Cleanup prerequisite will affect fork contributors

My fork's mcp_server.py imports chromadb directly for the BLOB seq_id repair (#664) and mtime-based cache invalidation (#663). Both are deeply Chroma-specific. The cleanup PR routing all callers through BaseCollection is the right gating decision, but it will require Chroma-specific fixups to move behind the backend boundary. Backend-specific maintenance operations (like BLOB repair) might need a BaseBackend.repair() or similar escape hatch — otherwise they end up as out-of-band scripts that bypass the abstraction.

§12 — Open questions

  1. Per-palace capabilities — yes. supports_contains_fast depending on whether an FTS index exists is a real case. I'd add: supports_mtime_detection or similar for backends where external-write detection is possible (Chroma via SQLite mtime) vs not (server-mode backends where the filesystem isn't local). My #663 fix is Chroma-specific precisely because mtime detection only makes sense for local file-backed stores.

Addresses the actual spec defects flagged in #743 review, ignoring
operator-UX asks that are not plugin-contract concerns.

- Goal #3: 'without data loss' → mirrors §8.2's capability-conditional
  lossless-vs-reembed framing. No more overpromise.
- §1.5: `server_embedder` is no longer an implicit escape hatch from
  identity/dimension rules. Such backends MUST expose an effective
  identity via `effective_embedder_identity()` and are bound by the
  same three-state check.
- §7.3: adds `maintenance_kinds: ClassVar[frozenset[str]]` advertisement
  mechanism. `run_maintenance(kind)` must raise
  UnsupportedMaintenanceKindError for unadvertised kinds. Benchmark
  harness reads this set rather than guessing kind names. Reserves
  `analyze`/`compact`/`reindex` as well-defined names.
- §1.2: adds `update()` as optional method with a default get+merge+
  upsert implementation. §2.1: `supports_update` redefined to gate
  atomic single-round-trip semantics (not mere capability), since the
  default impl already supports partial updates.

Operator asks explicitly NOT adopted (diplomatic shims, not contract
defects): `.to_dict()` compat on typed results, migration progress
reporting, `BaseBackend.repair()` separate from `run_maintenance`,
per-palace capability variance, identity recording on read-only ops.
@igorls
Copy link
Copy Markdown
Collaborator Author

igorls commented Apr 13, 2026

Thanks @skuznetsov @dekoza @jphein — substantive reviews on all sides. Commit 922aa99 closes the four spec defects the reviews surfaced. A number of reasonable operator asks are explicitly not being adopted; explaining both below, so the boundary is clear for anyone implementing against this.

Adopted — spec defects

1. Goals wording (skuznetsov) — Goal 3 now mirrors §8.2 instead of overpromising "without data loss." The spec delivers lossless migration when capabilities allow, re-embedding otherwise. Both are explicit; neither is hidden.

2. server_embedder identity contract (skuznetsov) — a stated safety invariant with an unspecified escape hatch is a real correctness hole. Closed:

A backend advertising server_embedder (§2.1) provides its own embedder and MAY ignore the embedder= kwarg passed to get_collection. That does not exempt it from the dimension and identity rules above. Such backends MUST expose an effective model_name: str and dimension: int via BaseCollection.effective_embedder_identity(), persist that effective identity on first write, and validate it on open under the same three-state rules.

server_embedder now documents where the embedding happens; it never suspends the safety contract. A backend that can't report its effective embedder identity doesn't qualify for the capability.

3. maintenance_kinds discovery (skuznetsov) — §7.3 referenced something the spec never defined. Fixed with a class-level maintenance_kinds: ClassVar[frozenset[str]]. The spec reserves "analyze", "compact", "reindex" as well-defined names; backends may add their own. run_maintenance(kind) MUST raise UnsupportedMaintenanceKindError for unadvertised kinds. The benchmark harness reads this set rather than guessing names.

4. update() method / supports_update mismatch (dekoza) — a capability token that gates nothing is incoherent. update() is now an optional method in §1.2 with a default implementation (get → merge → upsert). supports_update is redefined to gate atomic, single-round-trip semantics — not the mere ability to do partial updates, which the default already provides. Backends with native primitives (Postgres UPDATE, Lance merge_insert) override and advertise; others inherit the default.

Not adopted — operator comfort, not plugin-contract concerns

These are all reasonable asks, and I want to be explicit that they're not dismissed because they're wrong — they're not adopted because they don't belong in this spec. A well-engineered plugin contract has a small, stable surface. Layering operator-UX conveniences into the core interface makes it worse for every future backend author.

  • .to_dict() compat shim on typed results (jphein) — directly contradicts §1.3's design intent of dropping the Chroma dict shape. Adding an escape hatch re-legitimizes the thing the spec removes. Fork migration pain is what minor version bumps are for; callers move to attribute access.

  • Migration progress reporting (jphein) — CLI UX concern. Belongs in the mempalace migrate implementation, not in the backend contract. A backend doesn't care whether the CLI prints a progress bar. The CLI PR will implement it; the spec says nothing about it.

  • BaseBackend.repair(kind) escape hatch (jphein) — redundant with run_maintenance(kind). "Repair" is a maintenance kind, not a separate concept. Keeping them distinct would fragment the same mechanism. Chroma's BLOB seq_id repair becomes run_maintenance("repair") or a Chroma-specific reserved kind.

  • Per-palace capability variance (dekoza, jphein) — this muddies the contract. Capabilities advertise what a backend can do. Making them palace-dependent turns every capability check into "maybe — depends." That pushes state tracking onto every caller for a small gain. A backend that builds indexes lazily either guarantees the index exists when needed (and advertises the capability) or doesn't. Static contract.

  • Identity recording on read-only ops (jphein) — complicates the state machine for an operator-UX concern. "My read-heavy palace stays unknown forever" is solved by mempalace palace set-embedder --model NAME (already in §1.5), not by making the state transition rules asymmetric.

  • supports_mtime_detection (jphein) — almost certainly redundant with sync change-feed contract once the sync subsystem is specified. Don't add capabilities speculatively.

Net

Four spec edits, small and targeted. The contract is tighter for it. The rejected items are good ideas for the implementations — CLI progress, fork-friendly shims, richer operator tooling — they just don't live in the plugin contract.

#743 is ready for another look.

@skuznetsov
Copy link
Copy Markdown
Contributor

Thanks, this addresses my Postgres / pg_sorted_heap concerns. The revised goal wording, server_embedder identity contract, and maintenance_kinds discovery hook all look aligned with what #665 needs. I’m good with this RFC direction from the Postgres backend side and will review the implementation PRs when they land.

#757 landed mtime/inode cache invalidation and mempalace_reconnect
in mcp_server._get_client(). Both are Chroma-specific (stat of
chroma.sqlite3). They should migrate into ChromaBackend.get_collection
and ChromaBackend.close_palace during the §10 cleanup so the freshness
contract lives inside the backend, not in the caller.
@igorls igorls added area/install pip/uv/pipx/plugin install and packaging documentation Improvements or additions to documentation storage labels Apr 14, 2026
igorls added a commit that referenced this pull request Apr 14, 2026
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.

Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.

With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.
@skuznetsov
Copy link
Copy Markdown
Contributor

Quick coordination question from the PostgreSQL / pg_sorted_heap backend side:

Is RFC 001 now stable enough to use as the target contract for reworking #665, or would you prefer that implementation PRs wait until #743 is merged and the §10 backend cleanup follow-up starts/lands?

From my side, the expected #665 rewrite would target the RFC shape directly:

  • injected/recorded embedder identity instead of backend-owned embedding globals
  • typed QueryResult / GetResult
  • BaseBackend / BaseCollection + capability tokens
  • PalaceRef
  • maintenance hooks (maintenance_kinds, maintenance_state, run_maintenance)
  • conformance test coverage
  • conservative docs positioning for pgvector vs optional pg_sorted_heap

I’m asking to avoid rebasing the current conflicting branch into an intermediate shape if the intended path is now RFC-first.

@cschnatz
Copy link
Copy Markdown

Read the full spec. From the multi-tenant hosted side (#697), this looks solid.

Works for us

  • §4.4 PalaceRef.namespace absorbs our collection_prefix cleanly. We're already building shared team palaces on top of this — personal vaults map to PalaceRef(namespace="tenant_<uuid>"), team vaults to PalaceRef(namespace="team_<uuid>"), and search fans out across both. The namespace abstraction is exactly what we need.
  • §2.5 Concurrency model matches our sidecar design.
  • §3.1 Entry points mean we can ship our ChromaHttpBackend as a proper plugin instead of vendoring.

Questions

1. Namespace isolation — security boundary or naming convention?

For us, namespace is a tenant isolation boundary. When a user searches across personal + team vaults, the sidecar resolves which namespaces to query based on a gateway-authorized team_ids list. A team namespace that isn't in that list gets rejected. §4.4 says "the backend uses it as given" — should backends also enforce namespace isolation, or is that purely the caller's job?

2. PalaceRef.id uniqueness scope

Unique globally or per backend instance?

3. Typed results (§1.3) — hard cutover?

Is there a transition period, or do all callers need to migrate from Chroma dicts in one go? We have 30 MCP tool functions consuming dict shapes.

4. §7.4 NAMESPACE_MEMPALACE

Worth assigning now — the placeholder blocks Qdrant PRs.

5. §10 mcp_server caching

We hit a related bug (collection cache bleed between tenants). Moving caching into ChromaBackend.get_collection() would fix this structurally.

#697 alignment

Our collection_prefix dissolves into PalaceRef.namespace as described in §11. Ready to rebase once spec + cleanup PR land. Happy to be an early conformance suite adopter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/install pip/uv/pipx/plugin install and packaging documentation Improvements or additions to documentation storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants