Skip to content

fix: batch remaining unbounded ChromaDB reads missed by #66#146

Open
SoundMindsAI wants to merge 2 commits intoMemPalace:developfrom
SoundMindsAI:fix/batch-chromadb-reads
Open

fix: batch remaining unbounded ChromaDB reads missed by #66#146
SoundMindsAI wants to merge 2 commits intoMemPalace:developfrom
SoundMindsAI:fix/batch-chromadb-reads

Conversation

@SoundMindsAI
Copy link
Copy Markdown

Summary

Closes #40.

#66 fixed unbounded col.get() calls in cli.py and layers.py. This PR
fixes the same class of bug in the remaining call sites that #66 did not cover.
There is zero overlap — this PR does not touch cli.py or layers.py.

See #132 for prior discussion.

What this fixes

  • mcp_server.py — 5 unbounded col.get() calls: tool_status, tool_list_wings,
    tool_list_rooms, tool_get_taxonomy, tool_diary_read
  • miner.pystatus() had a hardcoded limit=10000 that silently dropped
    wings and rooms filed after the first 10k drawers
  • config.pypalace_path containing ~ was passed unexpanded to ChromaDB,
    causing "palace not found" errors

Also adds chromadb_utils.get_all(), a centralized batched-read utility so future
call sites don't need to reimplement pagination inline.

Changes

File Change
mempalace/chromadb_utils.py New — batched get_all() utility
mempalace/mcp_server.py 5 col.get()get_all()
mempalace/miner.py limit=10000get_all()
mempalace/config.py os.path.expanduser() on palace_path
README.md Add chromadb_utils.py to file reference table
mempalace/README.md Add chromadb_utils.py to module table

Test plan

Tests for this PR's changes:

  • test_chromadb_utils.py — 7 tests: all records, docs+meta, where filter, empty collection, batching, no duplicates, filtered pagination
  • test_mcp_server_reads.py — 9 tests: status, list_wings, list_rooms, taxonomy, diary read (all entries + empty)
  • test_miner_status.py — 2 tests: accurate drawer count, multiple wings reported
  • test_config.py — 2 new tests: tilde expansion from file config and env var

Regression tests for #66's batched reads (no existing coverage):

  • test_layers.py — 2 tests: Layer1 pulls from all rooms, wing filter excludes other wings

  • Full suite: 49 passed, ruff clean

@RobertoGEMartin
Copy link
Copy Markdown

Field report: 109,574 drawers on macOS ARM64

We're running mempalace v3.0.0 as the cross-session memory layer for a large technical project (86K drawers in one wing alone, 109K total). Here's what we hit and how we patched it — hopefully useful context for this PR.

The bug

tool_status, tool_list_wings, and tool_get_taxonomy all call col.get(include=["metadatas"]) without a limit. With 109K drawers, ChromaDB raises too many SQL variables (SQLite's SQLITE_MAX_VARIABLE_NUMBER limit, typically 999). The except Exception: pass in the MCP server swallowed the error silently, making list_wings and status return empty results.

PR #137's limit=10000 cap helps for smaller palaces but still misses 99K of our 109K drawers — wing counts were wrong and some wings disappeared entirely.

Our workaround

We patched mcp_server.py to bypass col.get() entirely for metadata-only operations, using direct SQLite queries against ChromaDB's backing store:

def _query_wing_room_counts():
    """Direct SQL against ChromaDB's SQLite — no variable limit."""
    db = _sqlite_path()
    conn = sqlite3.connect(db)
    rows = conn.execute("""
        SELECT json_extract(string_value, '$.wing') as wing,
               json_extract(string_value, '$.room') as room,
               COUNT(*) as cnt
        FROM embedding_metadata
        WHERE key = 'chroma:document'  -- or appropriate metadata key
        GROUP BY wing, room
    """).fetchall()
    conn.close()
    return rows

This is O(1) in memory regardless of palace size, returns in ~200ms for 109K drawers, and never hits the SQLite variable limit.

Why batched get_all() is better than our hack

Our SQL approach works but it's fragile — it depends on ChromaDB's internal schema which could change between versions. Your chromadb_utils.get_all() with proper batching is the right long-term fix because it uses the public API.

Suggestion

For the batching implementation, consider that col.get(offset=N, limit=5000) can still be slow on very large collections because ChromaDB doesn't have efficient offset pagination. If you see performance issues at >50K drawers, a cursor-based approach using ID ranges might be worth exploring.

Our setup

  • mempalace 3.0.0 (PyPI), macOS Sequoia, Apple Silicon
  • ChromaDB with all-MiniLM-L6-v2 embeddings
  • 109,574 drawers across 8 wings
  • Palace path: ~/.mempalace/palace/ (also hit the expanduser() bug you fixed here ✅)

Happy to test a pre-release if you want validation at this scale.

@SoundMindsAI SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from fea9789 to 0f39f7d Compare April 8, 2026 06:56
@SoundMindsAI
Copy link
Copy Markdown
Author

@RobertoGEMartin Thanks for the detailed field report — this is exactly the kind of real-world validation we needed. 109K drawers across 8 wings is a serious stress test.

Good to hear that get_all() via the public API lines up with what you found. Your direct-SQL workaround was clever but agreed it's fragile against ChromaDB schema changes.

Your point about offset-based pagination getting slow at >50K drawers is well taken. We'll track that as a follow-up optimization — cursor-based ID ranges would be the right approach there. For now the batching fixes correctness (no more silent truncation), and we can iterate on performance once we have numbers.

Would you be willing to test this branch against your 109K-drawer palace? That would give us confidence before merging. You can install directly from the branch:

pip install git+https://github.com/SoundMindsAI/mempalace.git@fix/batch-chromadb-reads

Specifically interested in:

  • Do tool_status, list_wings, and get_taxonomy return correct counts for all 109K drawers?
  • Any noticeable latency on those calls vs your direct-SQL approach?

Appreciate the offer to help validate.

@SoundMindsAI
Copy link
Copy Markdown
Author

Rebased onto latest main — merge conflicts resolved, all 546 tests passing, linting clean. Ready for review whenever you are.

@SoundMindsAI SoundMindsAI force-pushed the fix/batch-chromadb-reads branch 2 times, most recently from b5c7f6d to 5d6b706 Compare April 9, 2026 19:18
Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Review of #146fix: batch remaining unbounded ChromaDB reads missed by #66

Scope: +636/−21 · 12 file(s) · touches core

  • README.md (modified: +1/−0)
  • mempalace/README.md (modified: +1/−0)
  • mempalace/chromadb_utils.py (added: +55/−0)
  • mempalace/config.py (modified: +2/−2)
  • ⚠️ mempalace/mcp_server.py (modified: +13/−14)
  • mempalace/miner.py (modified: +2/−1)
  • tests/benchmarks/test_layers_bench.py (modified: +3/−3)
  • tests/test_chromadb_utils.py (added: +132/−0)
  • tests/test_config.py (modified: +19/−0)
  • tests/test_layers.py (modified: +93/−1)
  • tests/test_mcp_server_reads.py (added: +222/−0)
  • tests/test_miner_status.py (added: +93/−0)

Technical Analysis

  • 🔌 MCP server dispatch changes — verify JSON-RPC compliance and backward compatibility
  • 🪟 Windows compatibility — verify path handling works cross-platform
  • 🧠 Retrieval scoring changes — verify backward compatibility with existing Synapse configs

Issues

  • ⚠️ Touches mempalace/mcp_server.py — Core MCP server — maintainer guards this closely

Suggestions

  • Magic number(s) 1200, 2026 — consider extracting to named constant(s)

Strengths

  • ✅ Includes test coverage

🟡 Needs attention — touches guarded files and has items to address.


🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis

@bensig bensig changed the base branch from main to develop April 11, 2026 22:23
@igorls igorls added bug Something isn't working area/search Search and retrieval storage labels Apr 14, 2026
@SoundMindsAI SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from 5d6b706 to ecd63f7 Compare April 14, 2026 19:43
@SoundMindsAI
Copy link
Copy Markdown
Author

Hey! 👋 Just rebased this branch onto the latest develop to resolve the merge conflicts. Here's a quick rundown of how I handled the resolution:

mcp_server.pydevelop had already independently added batched pagination via _fetch_all_metadata() plus a TTL-cached wrapper (_get_cached_metadata()). Since that's a superset of what get_all() provides for those call sites, I kept develop's approach for the 4 read tools (status, list_wings, list_rooms, taxonomy). The get_all import is still needed though — tool_diary_read (which wasn't in a conflict zone) uses it for fetching both documents and metadatas, which _fetch_all_metadata doesn't support.

miner.py — Kept develop's full .palace imports (closets, mine_lock, etc.) and added the get_all import. The status() function uses get_all() as originally intended, replacing develop's col.get(limit=total) which could still silently truncate on very large palaces.

test_mcp_server_reads.py — Fixed a small test isolation issue that surfaced after the rebase: _patch_config now resets _metadata_cache on entry so the TTL cache doesn't leak stale data between tests.

Everything is green — ruff clean and all 908 tests passing. Would appreciate a review when you get a chance!

@SoundMindsAI SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from ecd63f7 to f74eb90 Compare April 15, 2026 11:01
SoundMindsAI and others added 2 commits April 15, 2026 07:05
col.get() without an explicit limit applies ChromaDB's small internal
default, silently dropping drawers from status, Layer1, MCP read tools,
miner status, and diary reads.  Replace all unbounded col.get() calls
with a new get_all() helper that paginates in safe batches.

Also fix tilde expansion in config.py palace_path resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@SoundMindsAI SoundMindsAI force-pushed the fix/batch-chromadb-reads branch from f74eb90 to e9508a0 Compare April 15, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval bug Something isn't working storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

status command shows max 10,000 drawers (hardcoded limit) + palace_path not expanded

4 participants