fix: batch remaining unbounded ChromaDB reads missed by #66 by SoundMindsAI · Pull Request #146 · MemPalace/mempalace

SoundMindsAI · 2026-04-07T21:14:30Z

Summary

Closes #40.

#66 fixed unbounded col.get() calls in cli.py and layers.py. This PR
fixes the same class of bug in the remaining call sites that #66 did not cover.
There is zero overlap — this PR does not touch cli.py or layers.py.

See #132 for prior discussion.

What this fixes

mcp_server.py — 5 unbounded col.get() calls: tool_status, tool_list_wings,
tool_list_rooms, tool_get_taxonomy, tool_diary_read
miner.py — status() had a hardcoded limit=10000 that silently dropped
wings and rooms filed after the first 10k drawers
config.py — palace_path containing ~ was passed unexpanded to ChromaDB,
causing "palace not found" errors

Also adds chromadb_utils.get_all(), a centralized batched-read utility so future
call sites don't need to reimplement pagination inline.

Changes

File	Change
`mempalace/chromadb_utils.py`	New — batched `get_all()` utility
`mempalace/mcp_server.py`	5 `col.get()` → `get_all()`
`mempalace/miner.py`	`limit=10000` → `get_all()`
`mempalace/config.py`	`os.path.expanduser()` on palace_path
`README.md`	Add `chromadb_utils.py` to file reference table
`mempalace/README.md`	Add `chromadb_utils.py` to module table

Test plan

Tests for this PR's changes:

test_chromadb_utils.py — 7 tests: all records, docs+meta, where filter, empty collection, batching, no duplicates, filtered pagination
test_mcp_server_reads.py — 9 tests: status, list_wings, list_rooms, taxonomy, diary read (all entries + empty)
test_miner_status.py — 2 tests: accurate drawer count, multiple wings reported
test_config.py — 2 new tests: tilde expansion from file config and env var

Regression tests for #66's batched reads (no existing coverage):

test_layers.py — 2 tests: Layer1 pulls from all rooms, wing filter excludes other wings
Full suite: 49 passed, ruff clean

RobertoGEMartin · 2026-04-08T06:50:19Z

Field report: 109,574 drawers on macOS ARM64

We're running mempalace v3.0.0 as the cross-session memory layer for a large technical project (86K drawers in one wing alone, 109K total). Here's what we hit and how we patched it — hopefully useful context for this PR.

The bug

tool_status, tool_list_wings, and tool_get_taxonomy all call col.get(include=["metadatas"]) without a limit. With 109K drawers, ChromaDB raises too many SQL variables (SQLite's SQLITE_MAX_VARIABLE_NUMBER limit, typically 999). The except Exception: pass in the MCP server swallowed the error silently, making list_wings and status return empty results.

PR #137's limit=10000 cap helps for smaller palaces but still misses 99K of our 109K drawers — wing counts were wrong and some wings disappeared entirely.

Our workaround

We patched mcp_server.py to bypass col.get() entirely for metadata-only operations, using direct SQLite queries against ChromaDB's backing store:

def _query_wing_room_counts():
    """Direct SQL against ChromaDB's SQLite — no variable limit."""
    db = _sqlite_path()
    conn = sqlite3.connect(db)
    rows = conn.execute("""
        SELECT json_extract(string_value, '$.wing') as wing,
               json_extract(string_value, '$.room') as room,
               COUNT(*) as cnt
        FROM embedding_metadata
        WHERE key = 'chroma:document'  -- or appropriate metadata key
        GROUP BY wing, room
    """).fetchall()
    conn.close()
    return rows

This is O(1) in memory regardless of palace size, returns in ~200ms for 109K drawers, and never hits the SQLite variable limit.

Why batched `get_all()` is better than our hack

Our SQL approach works but it's fragile — it depends on ChromaDB's internal schema which could change between versions. Your chromadb_utils.get_all() with proper batching is the right long-term fix because it uses the public API.

Suggestion

For the batching implementation, consider that col.get(offset=N, limit=5000) can still be slow on very large collections because ChromaDB doesn't have efficient offset pagination. If you see performance issues at >50K drawers, a cursor-based approach using ID ranges might be worth exploring.

Our setup

mempalace 3.0.0 (PyPI), macOS Sequoia, Apple Silicon
ChromaDB with all-MiniLM-L6-v2 embeddings
109,574 drawers across 8 wings
Palace path: ~/.mempalace/palace/ (also hit the expanduser() bug you fixed here ✅)

Happy to test a pre-release if you want validation at this scale.

SoundMindsAI · 2026-04-08T07:00:05Z

@RobertoGEMartin Thanks for the detailed field report — this is exactly the kind of real-world validation we needed. 109K drawers across 8 wings is a serious stress test.

Good to hear that get_all() via the public API lines up with what you found. Your direct-SQL workaround was clever but agreed it's fragile against ChromaDB schema changes.

Your point about offset-based pagination getting slow at >50K drawers is well taken. We'll track that as a follow-up optimization — cursor-based ID ranges would be the right approach there. For now the batching fixes correctness (no more silent truncation), and we can iterate on performance once we have numbers.

Would you be willing to test this branch against your 109K-drawer palace? That would give us confidence before merging. You can install directly from the branch:

pip install git+https://github.com/SoundMindsAI/mempalace.git@fix/batch-chromadb-reads

Specifically interested in:

Do tool_status, list_wings, and get_taxonomy return correct counts for all 109K drawers?
Any noticeable latency on those calls vs your direct-SQL approach?

Appreciate the offer to help validate.

SoundMindsAI · 2026-04-09T16:14:56Z

Rebased onto latest main — merge conflicts resolved, all 546 tests passing, linting clean. Ready for review whenever you are.

web3guru888

🔧 Review of #146 — fix: batch remaining unbounded ChromaDB reads missed by #66

Scope: +636/−21 · 12 file(s) · touches core

README.md (modified: +1/−0)
mempalace/README.md (modified: +1/−0)
mempalace/chromadb_utils.py (added: +55/−0)
mempalace/config.py (modified: +2/−2)
⚠️ mempalace/mcp_server.py (modified: +13/−14)
mempalace/miner.py (modified: +2/−1)
tests/benchmarks/test_layers_bench.py (modified: +3/−3)
tests/test_chromadb_utils.py (added: +132/−0)
tests/test_config.py (modified: +19/−0)
tests/test_layers.py (modified: +93/−1)
tests/test_mcp_server_reads.py (added: +222/−0)
tests/test_miner_status.py (added: +93/−0)

Technical Analysis

🔌 MCP server dispatch changes — verify JSON-RPC compliance and backward compatibility
🪟 Windows compatibility — verify path handling works cross-platform
🧠 Retrieval scoring changes — verify backward compatibility with existing Synapse configs

Issues

⚠️ Touches mempalace/mcp_server.py — Core MCP server — maintainer guards this closely

Suggestions

Magic number(s) 1200, 2026 — consider extracting to named constant(s)

Strengths

✅ Includes test coverage

🟡 Needs attention — touches guarded files and has items to address.

_{🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis}

SoundMindsAI · 2026-04-14T19:46:38Z

Hey! 👋 Just rebased this branch onto the latest develop to resolve the merge conflicts. Here's a quick rundown of how I handled the resolution:

mcp_server.py — develop had already independently added batched pagination via _fetch_all_metadata() plus a TTL-cached wrapper (_get_cached_metadata()). Since that's a superset of what get_all() provides for those call sites, I kept develop's approach for the 4 read tools (status, list_wings, list_rooms, taxonomy). The get_all import is still needed though — tool_diary_read (which wasn't in a conflict zone) uses it for fetching both documents and metadatas, which _fetch_all_metadata doesn't support.

miner.py — Kept develop's full .palace imports (closets, mine_lock, etc.) and added the get_all import. The status() function uses get_all() as originally intended, replacing develop's col.get(limit=total) which could still silently truncate on very large palaces.

test_mcp_server_reads.py — Fixed a small test isolation issue that surfaced after the rebase: _patch_config now resets _metadata_cache on entry so the TTL cache doesn't leak stale data between tests.

Everything is green — ruff clean and all 908 tests passing. Would appreciate a review when you get a chance!

col.get() without an explicit limit applies ChromaDB's small internal default, silently dropping drawers from status, Layer1, MCP read tools, miner status, and diary reads. Replace all unbounded col.get() calls with a new get_all() helper that paginates in safe batches. Also fix tilde expansion in config.py palace_path resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SoundMindsAI mentioned this pull request Apr 7, 2026

fix: batch ChromaDB reads to prevent silent truncation on large palaces #132

Closed

6 tasks

adv3nt3 mentioned this pull request Apr 7, 2026

Bugs and improvements #159

Closed

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from fea9789 to 0f39f7d Compare April 8, 2026 06:56

milla-jovovich mentioned this pull request Apr 9, 2026

CLI status silently truncates drawer count at 10,000 with no warning #180

Closed

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from 0f39f7d to 0aed178 Compare April 9, 2026 16:04

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch 2 times, most recently from b5c7f6d to 5d6b706 Compare April 9, 2026 19:18

web3guru888 reviewed Apr 11, 2026

View reviewed changes

bensig changed the base branch from main to develop April 11, 2026 22:23

bensig requested review from bensig, igorls and milla-jovovich as code owners April 11, 2026 22:23

igorls added bug Something isn't working area/search Search and retrieval storage labels Apr 14, 2026

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from 5d6b706 to ecd63f7 Compare April 14, 2026 19:43

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from ecd63f7 to f74eb90 Compare April 15, 2026 11:01

SoundMindsAI and others added 2 commits April 15, 2026 07:05

style: apply ruff formatting to test_layers_bench.py

e9508a0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from f74eb90 to e9508a0 Compare April 15, 2026 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch remaining unbounded ChromaDB reads missed by #66#146

fix: batch remaining unbounded ChromaDB reads missed by #66#146
SoundMindsAI wants to merge 2 commits intoMemPalace:developfrom
SoundMindsAI:fix/batch-chromadb-reads

SoundMindsAI commented Apr 7, 2026

Uh oh!

RobertoGEMartin commented Apr 8, 2026

Uh oh!

SoundMindsAI commented Apr 8, 2026

Uh oh!

SoundMindsAI commented Apr 9, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

SoundMindsAI commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SoundMindsAI commented Apr 7, 2026

Summary

What this fixes

Changes

Test plan

Uh oh!

RobertoGEMartin commented Apr 8, 2026

Field report: 109,574 drawers on macOS ARM64

The bug

Our workaround

Why batched get_all() is better than our hack

Suggestion

Our setup

Uh oh!

SoundMindsAI commented Apr 8, 2026

Uh oh!

SoundMindsAI commented Apr 9, 2026

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Technical Analysis

Issues

Suggestions

Strengths

Uh oh!

SoundMindsAI commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Why batched `get_all()` is better than our hack